# Exercise Eight: Cultural Data
For this exercise, pick a dataset of current or historical cultural data available in CSV format from one of the suggeted sources, or your own research. This will be easiest if your data includes some element of information over time. You'll be using the "group by" methods we discussed this week to make comparisons within the dataset: you might group by geography, party affiliation, age, gender, or other information marked in the dataset.

Your annotated code should include headings, and discuss your findings as well as the limitations in what you can visualize using this approach. This exercise should:

Import your selected structured CSV data
Use Pandas to note any preliminary trends in the CSV as a dataframe
Use "group by" to break down at least two different subsets of data
Plot a comparison between the grouped data (this will be easiest over time)
Use one additional form of analysis or visualization from any we've tried so far 


## Stage One: Import Structured CSV Data (bonus: from multiple files)



In [3]:

import pandas as pd
import os
path = "flix/"
years = []
df_list = []

with os.scandir(path) as entries:
    for entry in entries:
        years.append(entry.name)
        temp_df = pd.read_csv(f'{path}{entry.name}')
        df_list.append(temp_df)
        
print(years[1])
print(df_list[1].head())

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 7431: invalid continuation byte

## Stage Two: Explore and note differences in headings / datatypes over the years

In [None]:
[print(df.dtypes) for df in df_list]


In [None]:
[print(df['Country'].head()) for df in df_list]


## Stage Three: Use Groupby to Explore (by Region)
Note this sample shows only one example, two are required for the exercise itself

In [None]:
close_region = df_list[1].groupby('Region')
close_region.get_group('Western Europe').head()

In [None]:
print(close_region['Score'].mean())
print(close_region['Health (Life Expectancy)'].mean())

## Stage Four: Plot Grouped Data¶


In [None]:
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams.update({'font.size': 7})

regions = close_region['Region'].unique()
names = [str(i) for i in regions]
names = [name.replace(' ', '\n') for name in names]
names = [name.replace("['",'') for name in names]
names = [name.replace("']",'') for name in names]

print(names[1])
x = np.arange(len(names))
width = 1/len(names)
fig, ax = plt.subplots()
rects1 = ax.bar(x - width, close_region['Score'].mean(), width, label='Happiness')
rects2 = ax.bar(x, close_region['Health (Life Expectancy)'].mean()*10, width, label='Life Expectancy')
rects3 = ax.bar(x + width, close_region['Freedom'].mean()*10, width, label='Freedom')


ax.set_ylabel('Rankings')
ax.set_title(years[1])
ax.set_xticks(x)
ax.set_xticklabels(names)
ax.legend()
fig.tight_layout()

plt.show()

## Bonus Stage:

