## How do monthly salaries vary across geographical regions?

### View the dataframes dataframe information

In [55]:
import pandas as pd 
import plotly.express as px
import numpy as np

In [42]:
world_salary = pd.read_csv("https://raw.githubusercontent.com/Kingtilon1/DATA-602/main/salary_data.csv")
print(world_salary.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 221 entries, 0 to 220
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   country_name    221 non-null    object 
 1   continent_name  221 non-null    object 
 2   wage_span       221 non-null    object 
 3   median_salary   221 non-null    float64
 4   average_salary  221 non-null    float64
 5   lowest_salary   221 non-null    float64
 6   highest_salary  221 non-null    float64
dtypes: float64(4), object(3)
memory usage: 12.2+ KB
None


### View world_salary statistics

In [53]:
print(world_salary.describe())

       median_salary  average_salary  lowest_salary  highest_salary
count     221.000000      221.000000     221.000000      221.000000
mean     1762.631906     1982.339812     502.783204     8802.165619
std      1634.708716     1835.429193     470.073328     8140.210641
min         0.261335        0.285524       0.072092        1.271103
25%       567.210000      651.000000     163.930000     2900.480000
50%      1227.460000     1344.230000     339.450000     5974.360000
75%      2389.010000     2740.000000     690.000000    12050.740000
max      9836.070000    11292.900000    2850.270000    50363.930000


#### Central Tendency:

Median and average salaries provide central measures of income across countries.
median_salary ($1,762.63) and average_salary ($1,982.34) indicate the typical salary levels.

#### Variability:

Standard deviations are relatively high, suggesting considerable variability in salary distribution among countries.

#### Range:

Minimum and maximum values reflect the wide range of salaries observed globally.
highest_salary ($8,802.17) demonstrates the upper limit, while lowest_salary ($502.78) represents the lower limit.

#### Distribution Percentiles:

Quartiles (25%, 50%, 75%) offer insights into the distribution's shape.
For instance, 75% of countries have median_salary below $2,389.01.

#### Global Salary Trends:

Trends in median and average salaries can be observed across percentiles.
Countries at higher percentiles (e.g., 75%) exhibit higher salary levels.

#### Impact of Outliers:

The presence of outliers, indicated by large standard deviations, may skew salary distribution.

### Check for any missing values

In [44]:
missing_values = world_salary.isnull().sum()
print(missing_values)

country_name      0
continent_name    0
wage_span         0
median_salary     0
average_salary    0
lowest_salary     0
highest_salary    0
dtype: int64


### There are no missing values within the data frame

### Rename the column continent_name to geographical region to better encapsulate the data

In [45]:
world_salary.rename(columns={'continent_name': 'geographical_region'}, inplace=True)

### View the distribution of average salaries

In [46]:
fig = px.box(world_salary, x='geographical_region', y='average_salary', title='Distribution of Average Salary by Geographical Region')
fig.update_xaxes(title='Geographical Region')
fig.update_yaxes(title='Average Salary')
fig.show()

### Sort the data frame from highest to lowest salary

In [47]:
## sort the data frame from highest to lowest average salary
sorted_df = world_salary.sort_values(by=['average_salary'], ascending=[False])
print(sorted_df.head())

       country_name geographical_region wage_span  median_salary  \
192     Switzerland              Europe   Monthly        9836.07   
83         Guernsey              Europe   Monthly        8689.02   
209   United States    Northern America   Monthly        6966.00   
35           Canada    Northern America   Monthly        6311.03   
208  United Kingdom              Europe   Monthly        6300.00   

     average_salary  lowest_salary  highest_salary  
192        11292.90        2850.27        50363.93  
83          9409.76        2367.07        41869.51  
209         7925.00        2000.00        35250.00  
35          7352.94        1850.00        32720.59  
208         7235.37        1829.27        32214.63  


### After sorting the data frame, we see that switzerland has the highest average salary with $11,292, followed by Guernsey then the United states

## Comparing Geographical regions median and average salaries

I hypothesis that Northern America will average a higher median and average salary compared to other geographical locations

In [48]:
## create a df that holds the average and median salaries in a data frame
region_salary = world_salary.groupby('geographical_region')['average_salary'].mean().reset_index()

region_median_salary = world_salary.groupby('geographical_region')['median_salary'].median().reset_index()
print(region_median_salary)

  geographical_region  median_salary
0              Africa        525.390
1                Asia        946.205
2           Caribbean       1125.295
3     Central America       1575.985
4              Europe       3021.140
5       North America       2324.080
6    Northern America       4918.970
7             Oceania       1330.000
8       South America       1131.500


In [49]:
## renaming continent name with goegraphical region to better suit the column names
fig = px.bar(region_salary, x ="geographical_region", y="average_salary", color="geographical_region", pattern_shape="geographical_region")
fig.show()

### Visualize the median salaries 

In [50]:

fig = px.bar(region_median_salary, x='geographical_region', y='median_salary', color = 'geographical_region', title="Median Salary by Geographical Region")
fig.show()

### By the graph we see that Northern America indeed has the higher median salary as well as a higher average salary than other geographical regions

## Extra

### Is there a correlation between geographical region and average salaries?

In [51]:
fig = px.scatter(region_salary, x="geographical_region", y="average_salary", color="geographical_region", size="average_salary", an)
fig.show()

In [56]:
world_salary['years'] = np.random.randint(2000, 2023, size=len(world_salary))


### Trying out animations for fun

In [61]:
px.scatter(world_salary, x="geographical_region", y="average_salary", animation_frame="years", animation_group="geographical_region",
           size="average_salary", color="median_salary", hover_name="geographical_region",
           log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90])