<a href="https://colab.research.google.com/github/ProfessorPatrickSlatraigh/CST3512/blob/main/Dewey_Dictionary.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Dewey Dictionary

Loads the Dewey Decimal System **codes** and **categories** to a dictionary using a [reference from the University of Illinois Library](https://www.library.illinois.edu/infosci/research/guides/dewey/).   

Makes the Dewey `code:category` dictionary available as a `pandas` DataFrame or as a Python Dictionary.     



First, copy the file to the current working directory using a `.csv` file from ProfessorPatrickSlatraigh Github at:     
* https://raw.githubusercontent.com/ProfessorPatrickSlatraigh/data/main/dewey_codes_categories.csv

In [None]:
!curl 'https://raw.githubusercontent.com/ProfessorPatrickSlatraigh/data/main/dewey_codes_categories.csv' -o dewey_dictionary.csv

##Housekeeping    

Import modules required:    
* **csv** - to read a csv file into a variable    
* **pandas** - for dataframes, etc.
* **numpy** - for arrays   


In [30]:
import csv    

import pandas as pd    

import numpy as np    


##Read the CSV into a Dictionary    

Create an empty `dewey_dict` dictionary and populate it with `key:value` pairs read from the two columns in the `.csv` file, excluding the header row.    


In [None]:
# Create an empty dictionary for Dewey code:category pairs   
dewey_dict = {}

try: 
    with open('dewey_dictionary.csv', mode='r') as source:
        csv_read = csv.reader(source)
        next(csv_read)              # to skip the header row in the csv_read file
        for line in csv_read: 
            # print(line)           # scaffolding to peek at lines in csv_read
            # print(line[0])        # scaffolding to peek at col 0 in csv_read
            # print(type(line[0]))  # scaffolding to peek at col 0 type in csv_read
            # wait = input('Hit Enter to continue.') # wait for scaffolding output
            dewey_dict[line[0]] = line[1] # dict entry (key=1st col, value=2nd col)
 
    print('Created the `dewey_dict` Dictionary.')
except:
    print('Error encountered attempting to create `dewey_dict`.')

Descriptive information on the `dewey_dict` Dictionary. 

In [None]:
# print the length of the dictionary (# of key:value pairs)
print(len(dewey_dict))

In [None]:
# print the populated dictionary   
print(dewey_dict)



---



##Read the .CSV into a Dataframe then Create a Dictionary    

Use of pandas' built-in function `read_csv()` with a few parameters to specify the `.csv` file format. After calling pandas `read_csv()`, convert the result to a dictionary using the built-in pandas function `to_dict()`.


* `header` parameter specifies that the headers are explicitly passed or declared by another parameter.    
* `index_col` specifies which column is used as the labels for the DataFrame object that the `read_csv()` function returns. In this case, the first column of index 0 is the key.    
* `squeeze` parameter defines if the data contains only one column for values. In this case, there is only one column since the first column is used as the index column or the labels.    


In [None]:
try: 
    # Use pandas `read_csv` to read the file
    df_dewey = pd.read_csv('dewey_dictionary.csv', header=0, index_col=0, squeeze=True)
    
    # Use pandas `to_dict()` to assign dataframe index:value to dictionary
    dict_dewey = df_dewey.to_dict()
    
    print('Created `df_dewey` DataFrame and `dict_dewey` Dictionary.')
except:
    print('Error attempting to create `df_dewey` and/or `dict_dewey.')

Descriptive information on the `df_dewey` DataFrame (a series, with `dewey_code` as the index.)

In [None]:
df_dewey.describe

In [None]:
df_dewey.head()

Descriptive information on the `dict_dewey` Dictionary.

In [None]:
print(len(dict_dewey))

In [None]:
print(dict_dewey)



---



#**Exercise**    

Add your code below to take a DataFrame with Dewey Decimal System `dewey_code` and `dewey_category` columns and transform the DataFrame to include a hierarchical structure of the following columns, derived from the `dewey_code` column:    
* **dewey_level1** - based on the **first** character in `dewey_code`    
* **dewey_level2** - based on the **second** character in `dewey_code`    
* **dewey_level3** - based on the **third** character in `dewey_code`    



In [None]:
### YOUR CODE HERE ###
### add snippets below, if you like ###