## Word Frequency Visualizations
March 05, 2021

### Notebook introduction
This notebook creates visualizations from data in .csv files that are loaded in as Pandas dataframes. The visualizations are made using the Seaborn package in Python. Each .csv file contains word frequency data (how many times certain words appeared) by issue month and year. To make the data more legible I combined all yearly data together for the visualizations.


In [None]:
import pandas
import matplotlib.pyplot as plt
import seaborn as sns

### Creating data for visualizations
These visualizations use .csv files generated from the results from the word frequency searches.

In [None]:
#importing safety csv into pandas dataframe
dfsafe = pandas.read_csv("csv/WFREQ_safety-012321.csv", sep=',', encoding='utf8')
dfsafe.head()

In [None]:
#turning a year_month format into only year
dfsafe['year'] = dfsafe['yr_m'].str[:4]
dfsafe['year'].head() #printing out first five instances to test if it is working

In [None]:
#I had to do this step because I wanted my legend to provide the word rather than
#"word_n," my heading for normalized data in my spreadsheet
dfsafe['Safety'] = dfsafe['safety_n']
dfsafe['Safety'].head()

In [None]:
#grouping the data for a word by year rather than year/month
safetyval_year = dfsafe[['Safety', 'year']].groupby('year').sum()
safetyval_year.head()

### Create Visualizations

In [None]:
#make line graph 
fig, ax = plt.subplots(figsize=(18,6))

sns.lineplot(data=safetyval_year, palette='viridis').set_title('Use of "Safety" Over Time')

plt.xlabel('Year')
plt.ylabel('Number of words per 1000')

ax.spines['top'].set_visible(False)
ax.spines["bottom"].set_visible(True)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(True)



ax.legend().set_visible(True)