# Occupation Analysis

### Introduction:
This notebook analyzes user data to explore various statistics related to occupations.


## Import the necessary libraries

In [2]:
import pandas as pd
users = pd.read_table('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user', sep='|', index_col='user_id')
users.head()

Unnamed: 0_level_0,age,gender,occupation,zip_code
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,24,M,technician,85711
2,53,F,other,94043
3,23,M,writer,32067
4,24,M,technician,43537
5,33,F,other,15213


Explanation: We read the dataset from the given URL using pd.read_table with sep='|' to specify the separator and index_col='user_id' to set the user_id column as the index.

Display the first few rows of the dataset

## Discover what is the mean age per occupation
Group by occupation and calculate the mean age


Explanation:
users.groupby('occupation') groups the data by the occupation column.
.age.mean() calculates the mean age for each occupation.

In [3]:
mean_age_per_occupation = users.groupby('occupation').age.mean()
mean_age_per_occupation

occupation
administrator    38.746835
artist           31.392857
doctor           43.571429
educator         42.010526
engineer         36.388060
entertainment    29.222222
executive        38.718750
healthcare       41.562500
homemaker        32.571429
lawyer           36.750000
librarian        40.000000
marketing        37.615385
none             26.555556
other            34.523810
programmer       33.121212
retired          63.071429
salesman         35.666667
scientist        35.548387
student          22.081633
technician       33.148148
writer           36.311111
Name: age, dtype: float64

## Discover the Male ratio per occupation and sort it from the most to the least

In [4]:
def gender_to_numeric(x):
    if x =='M':
        return 1
    if x =='F':
        return 0
users['gender_n'] = users['gender'].apply(gender_to_numeric)


Create a function to convert gender to numeric values
Explanation: The function gender_to_numeric converts gender values to numeric: 1 for male ('M') and 0 for female ('F').

Apply the function to the gender column and create a new column

users['gender'].apply(gender_to_numeric) applies the function to the gender column.
The result is stored in a new column gender_n.<br>
### Calculate the male ratio per occupation

In [5]:
male_ratio_per_occupation = (users.groupby('occupation').gender_n.sum() / users.occupation.value_counts()) * 100 

Explanation:

users.groupby('occupation').gender_n.sum() calculates the total number of males per occupation.
users.occupation.value_counts() gives the total count of users per occupation.
The ratio of males per occupation is calculated and multiplied by 100 to get the percentage.<br>

### Sort the male ratio from the most to the least

In [6]:
male_ratio_per_occupation_sorted = male_ratio_per_occupation.sort_values(ascending=False)
male_ratio_per_occupation_sorted


occupation
doctor           100.000000
engineer          97.014925
technician        96.296296
retired           92.857143
programmer        90.909091
executive         90.625000
scientist         90.322581
entertainment     88.888889
lawyer            83.333333
salesman          75.000000
educator          72.631579
student           69.387755
other             65.714286
marketing         61.538462
writer            57.777778
none              55.555556
administrator     54.430380
artist            53.571429
librarian         43.137255
healthcare        31.250000
homemaker         14.285714
dtype: float64

male_ratio_per_occupation.sort_values(ascending=False) sorts the male ratios in descending order.

## For each occupation, calculate the minimum and maximum ages
Group by occupation and calculate the minimum and maximum ages


In [7]:
min_max_age_per_occupation = users.groupby(['occupation']).age.agg(['min','max'])
min_max_age_per_occupation

Unnamed: 0_level_0,min,max
occupation,Unnamed: 1_level_1,Unnamed: 2_level_1
administrator,21,70
artist,19,48
doctor,28,64
educator,23,63
engineer,22,70
entertainment,15,50
executive,22,69
healthcare,22,62
homemaker,20,50
lawyer,21,53


Explanation:
users.groupby('occupation') groups the data by occupation.
.age.agg(['min', 'max']) calculates the minimum and maximum ages for each occupation.


## For each combination of occupation and gender, calculate the mean age
Group by occupation and gender, and calculate the mean age

In [14]:
mean_age_per_occupation_gender = users.groupby(['occupation', 'gender']).age.agg(['mean','median'])
mean_age_per_occupation_gender


Unnamed: 0_level_0,Unnamed: 1_level_0,mean,median
occupation,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
administrator,F,40.638889,38.5
administrator,M,37.162791,35.0
artist,F,30.307692,30.0
artist,M,32.333333,32.0
doctor,M,43.571429,45.0
educator,F,39.115385,40.5
educator,M,43.101449,44.0
engineer,F,29.5,29.5
engineer,M,36.6,36.0
entertainment,F,31.0,31.0


Explanation:
users.groupby(['occupation', 'gender']) groups the data by both occupation and gender.
.age.mean() calculates the mean age for each group.

## For each occupation present the percentage of women and men
Create a data frame and apply count to gender

In [9]:
gender_ocup = users.groupby(['occupation', 'gender']).agg({'gender': 'count'})


Explanation:

users.groupby(['occupation', 'gender']).agg({'gender': 'count'}) counts the number of each gender within each occupation.<br>

### Create a DataFrame and apply count for each occupation

In [10]:
occup_count = users.groupby(['occupation']).agg('count')


Explanation:

users.groupby(['occupation']).agg('count') counts the total occurrences for each occupation.<br>

### Divide the gender_ocup per the occup_count and multiply per 100

In [11]:
occup_gender = gender_ocup.div(occup_count, level="occupation") * 100

Explanation:

gender_ocup.div(occup_count, level="occupation") divides the gender-specific counts by the total counts for each occupation.
The result is multiplied by 100 to get the percentage.<br>

### Present all rows from the 'gender' column

In [12]:
occup_gender_percentage = occup_gender.loc[:, 'gender']
occup_gender_percentage

occupation     gender
administrator  F          45.569620
               M          54.430380
artist         F          46.428571
               M          53.571429
doctor         M         100.000000
educator       F          27.368421
               M          72.631579
engineer       F           2.985075
               M          97.014925
entertainment  F          11.111111
               M          88.888889
executive      F           9.375000
               M          90.625000
healthcare     F          68.750000
               M          31.250000
homemaker      F          85.714286
               M          14.285714
lawyer         F          16.666667
               M          83.333333
librarian      F          56.862745
               M          43.137255
marketing      F          38.461538
               M          61.538462
none           F          44.444444
               M          55.555556
other          F          34.285714
               M          65.714286
progra

Explanation:
occup_gender.loc[:, 'gender'] selects all rows from the gender column to present the percentages of women and men for each occupation.