# Example notebook

In this example real-world example from the Living with Machines project, we have a list of newspaper titles with different abbreviations, and we need to check which identification number, `NLP` that each abbreviation is associated with, within a certain date range.

## Making sure all is up-to-date

Before we get started, make sure that the latest version is installed:

In [5]:
%pip install ../dist/dated_translator-0.1.1-py3-none-any.whl

Processing /Users/kwesterling/Repositories/lwm/dated-translator/dist/dated_translator-0.1.1-py3-none-any.whl
Installing collected packages: dated-translator
Successfully installed dated-translator-0.1.1
Note: you may need to restart the kernel to use updated packages.


## Setting up

First, we import the package:

In [6]:
from dated_translator import Lookup

Then, let's set up our lookup. The file we are passing to the setup of the `Lookup` object is here called `JISC-papers.csv`. Let's have a preview of the file:

In [13]:
import pandas as pd

pd.read_csv("JISC-papers.csv").head(2)

Unnamed: 0,Newspaper Title,System ID,NLP,JISC,Normalised Title,Abbr,StartD,StartM,StartY,EndD,EndM,EndY
0,Aberdeen Journal and general advertiser for th...,13921360,31,JISC1,Aberdeen Journal,ANJO,1,Jan,1800,23,Aug,1876
1,Aberdeen Weekly Journal and general advertiser...,13921362,32,JISC1,Aberdeen Journal,ANJO,30,Aug,1876,31,Dec,1900


Now, we can set up our `Lookup` object.

In this example, we want to get the resulting `NLP` **31** for any ANJO abbreviations (`Abbr`) between 1881-01-01 and 1876-08-23, and **32** for any of the same abbreviation between 1876-08-30 and 1900-12-31.

To set this up, we need to pass the dataset's name, and specify the names of the lookup's term 1 (`Abbr`) and term 2 (`NLP`). _Note: It doesn't matter in which order you pass them, but which one is considered term 1 and 2 will affect our `left_translate` and `right_translate` methods further down the line._

We also need to specify the particular date column format in our file. Since we're not using the standard setup here (a `Start Date` and `End Date` column respectively), we can pass a dictionary which requires three items, specifying the name of the year, month, and day columns, and their date formatting. We do so for both the start date and end date columns:

In [18]:
lookup = Lookup(
    "JISC-papers.csv",
    term_1_column = "Abbr",
    term_2_column = "NLP",
    start_date_column = {
        "StartY": "%Y",
        "StartM": "%b",
        "StartD": "%d"
    },
    end_date_column = {
        "EndY": "%Y",
        "EndM": "%b",
        "EndD": "%d"
    }
)

### Lookups!

After this setup, we can run the `left_translate` method to check what the `NLP` is for the abbreviation "ANJO" on the date 1800-01-01:

In [19]:
lookup.left_translate("ANJO", "1800-01-01")

[31]

This should return the value: `[31]`, that is, a list of the possible NLPs for this abbreviation on this particular date.

Similarly, we can run the `right_translate` method to check what the `Abbr` is for a given `NLP` (31) on the date 1800-01-01:

In [21]:
lookup.right_translate(31, "1800-01-01")

['ANJO']

The result should, in a reverse of the result above, be `['ANJO']`, that is, a list of the possible abbreviations for this NLP in on this particular date.

### Further examples

Note that, if you change the column names, you will get a different kind of lookup. In this example, we're looking for the normalised titles for any given NLP (and vice versa):

In [24]:
lookup = Lookup(
    "JISC-papers.csv",
    term_1_column="Normalised Title",
    term_2_column="NLP",
    start_date_column={"StartY": "%Y", "StartM": "%b", "StartD": "%d"},
    end_date_column={"EndY": "%Y", "EndM": "%b", "EndD": "%d"}
)

print(lookup.left_translate("Aberdeen Journal", "1800-01-01"))
print(lookup.right_translate(31, "1800-01-01"))

[31]
['Aberdeen Journal']
