In [22]:
import pandas as pd

df = pd.read_csv('C:/dataflights/untidy/untidy_religion.csv')
df

Unnamed: 0,Religion,<10k,10-20k,20-30k,30-40k,40-50k,50-75k,75-100k,100-150k,>150k,refused
0,agnostic,27,34,60,81,76,137,122,109,84,96
1,atheist,12,27,37,52,35,70,73,59,74,76
2,buddhist,27,21,30,34,33,58,62,39,53,54
3,catholic,418,617,732,670,638,1116,949,792,633,1489
4,refused,15,14,15,11,10,35,21,17,18,116


#### Reshaping the data
- The data provided had observations as the columns. To fix this problem , I used the melt() method to reshape the data into long format. I used the Religion column as the indentifier column and income and count as measurable values.

In [23]:
df = df.melt(id_vars=["Religion"], var_name=['income'], value_name='count') 
df.head(10)  

Unnamed: 0,Religion,income,count
0,agnostic,<10k,27
1,atheist,<10k,12
2,buddhist,<10k,27
3,catholic,<10k,418
4,refused,<10k,15
5,agnostic,10-20k,34
6,atheist,10-20k,27
7,buddhist,10-20k,21
8,catholic,10-20k,617
9,refused,10-20k,14


### Grouping the data 
-I decided to group the data by religion in order to calculate the different measures given to each religion individually. I then created variables in which each religion is group by seperately.

In [24]:
group = df.groupby('Religion') 
agnostic = group.get_group('agnostic') 
atheist = group.get_group('atheist') 
buddhist = group.get_group('buddhist') 
catholic = group.get_group('catholic') 
refused = group.get_group('refused') 

#### Calculations 
-The following code sums up all the values in religion to figure out how many people were surveyed to make the data set

In [25]:
surveyed = group.sum().sum()
surveyed

count    10078
dtype: int64

#### In order to better work with this specific data frame, I decided to reset the indexes

In [26]:
agnostic = agnostic.reset_index() 
agnostic

Unnamed: 0,index,Religion,income,count
0,0,agnostic,<10k,27
1,5,agnostic,10-20k,34
2,10,agnostic,20-30k,60
3,15,agnostic,30-40k,81
4,20,agnostic,40-50k,76
5,25,agnostic,50-75k,137
6,30,agnostic,75-100k,122
7,35,agnostic,100-150k,109
8,40,agnostic,>150k,84
9,45,agnostic,refused,96


### Calculations 
-The following code gets the percentage of agnostics that make below 100k. Resetting the indexes allowed me to easily pull the specific data I wanted to use to make this calculation.

In [27]:
all_ag = agnostic['count'].sum() 
below_100 = agnostic.loc[:6, ['count']].sum() 
percent_ag = below_100 / all_ag * 100 
percent_ag.round()

count    65.0
dtype: float64