<br>
<b><u><h1 align="center", font="Constantia", style="font-family:Constantia;font-weight:bold;font-size: 37px">Coding and Implementation</h1></b></u>
<br>
<span style="font-family:Arial;font-size: 21px">
The code for the project is available online on GitHub <br>
    <i><b>link</b></i>: <i><a>https://github.com/P-Manoj-Kumar/ip-final-project</a></i>
</span>
<br>
<br>
 
<i>**loader.py**<i>

<font size=2>

**Importing necessary libraries:**
```python
import country_converter as coco
import pandas as pd
import streamlit as st
from sqlalchemy import create_engine
```
<br>
    
**Loading data from MySQL table:**
```python
def load_data() -> pd.DataFrame:

    try:
        connection = create_engine(
            f"mysql+pymysql://root:{st.secrets.root_password}@localhost:3306/data"
        ).connect()

        df = pd.read_sql_table(table_name="salaries", con=connection, schema="data")

    finally:
        connection.close()

    return clean_df(df)
```
<br>
    
**Cleaning the Data:**
```python
def clean_df(df: pd.DataFrame) -> pd.DataFrame:
    remote_ratio_map = {
        0: "No remote work",
        50: "Partially remote",
        100: "Fully remote",
    }

    experience_level_map = {
        "EN": "Junior",
        "MI": "Intermediate",
        "SE": "Expert",
        "EX": "Director",
    }

    employment_type_map = {
        "PT": "Part-time",
        "FT": "Full-time",
        "CT": "Contract",
        "FL": "Freelance",
    }

    company_size_map = {
        "S": "Small",
        "M": "Medium",
        "L": "Large",
    }

    df: pd.DataFrame = (
        df.astype(
            {
                "work_year": "category",
                "experience_level": pd.CategoricalDtype(
                    experience_level_map.keys(), ordered=True
                ),
                "employment_type": pd.CategoricalDtype(
                    employment_type_map.keys(), ordered=True
                ),
                "company_size": pd.CategoricalDtype(
                    company_size_map.keys(), ordered=True
                ),
                "remote_ratio": pd.CategoricalDtype(
                    remote_ratio_map.keys(), ordered=True
                ),
            }
        )
        .drop(columns=["salary", "salary_currency"])
        .replace(
            {
                "remote_ratio": remote_ratio_map,
                "experience_level": experience_level_map,
                "employment_type": employment_type_map,
                "company_size": company_size_map,
                "employee_residence": dict(
                    zip(
                        df.employee_residence.unique(),
                        coco.convert(df.employee_residence.unique(), to="name_short"),
                    )
                ),
                "company_location": dict(
                    zip(
                        df.company_location.unique(),
                        coco.convert(df.company_location.unique(), to="name_short"),
                    )
                ),
            },
        )
        .assign(working_overseas=lambda x: x.employee_residence != x.company_location)
    )
    return df


```

<div style="page-break-after: always;"></div>

<i>**🏠_Home.py**</i>
<font size=2>

```python
import streamlit as st

from utils.loader import load_data
```
<br>
    
**Caching the Data for performance boost:**
```python
if "df" not in st.session_state:
    st.session_state.df = st.cache(load_data)()


st.image("./images/cover.jpeg")
```
<br>
    
**Code for Home Page:**
```python
st.markdown(
    f"""
<h3 align="center">
Informatics Practices Investigatory Project (2022-23)
</h3>
---
<h2 align="center">
Web-based Interactive Dashboard for
</h2>

<h1 align="center">
Job Salaries in Data Science
</h1>

---

### About the Dataset:

**Source**: https://salaries.ai-jobs.net/download/

**Shape**: `{st.session_state.df.shape[0]}` _rows_ x `{st.session_state.df.shape[1]}` _columns_

---

### Columns:  

1. **work_year**: The year the salary was paid. 
1. **experience_level**: The experience level in the job during the year with the following possible values:  
    - Junior  
    - Intermediate  
    - Expert  
    - Director  

1. **employment_type**: The type of employment for the role:
    - Part-time
    - Full-time
    - Contract
    - Freelance

1. **job_title**: The role worked in during the year.

1. **salary_in_usd**: The total gross salary amount paid in USD. 

1. **employee_residence**: Employee's primary country of residence in during the work year

1. **remote_ratio**: The overall amount of work done remotely, possible values are as follows:
    - No remote work (less than 20%)
    - Partially remote
    - Fully remote (more than 80%)

1. **company_location**: The country of the employer's main office or contracting branch

1. **company_size**: The average number of people that worked for the company during the year.
    - Small (less than 50 employees)
    - Medium (50 to 250 employees)
    - Large (more than 250 employees)
""",
    unsafe_allow_html=True,
)

st.dataframe(st.session_state.df.sample(n=5, random_state=42))


```

<div style="page-break-after: always;"></div>

<i>**pages/1_📊_Plot_By_Job_Title**</i>
<font size=2>

```python
import pandas as pd
import plotly.express as px
import streamlit as st

from utils.loader import load_data


if "df" not in st.session_state:
    st.session_state.df = st.cache(load_data)()

```
<br>
    
**Code for Plot-1:**
```python
def bar_job_title(df: pd.DataFrame, work_years: list[int]):

    if work_years == []:
        return None, None

    df_sub = df.query("work_year in @work_years")

    top_jt = df_sub.job_title.value_counts().head(15)

    jt = (
        df_sub.groupby(by=["job_title", "experience_level"])
        .salary_in_usd.count()
        .reset_index()
        .rename(columns={"salary_in_usd": "no_of_empls"})
        .set_index("job_title")
        .loc[top_jt.index.to_list()]
        .reset_index()
    )

    no_1_jt = jt.query("job_title == @top_jt.head(1).index.values[0]")

    fig = px.bar(
        jt,
        x="job_title",
        y="no_of_empls",
        color="experience_level",
        labels={
            "job_title": "Job Title",
            "no_of_empls": "No. of Employees",
            "experience_level": "Experience Level",
        },
    )

    return no_1_jt, fig

```
<br>
    
**Code for Plot-2:**
```python
def bar_sal_job_title(df: pd.DataFrame, work_years: list[int]):

    if work_years == []:
        return None, None

    df_sub = df.query("work_year in @work_years")

    ms = (
        df_sub.groupby(["job_title", "experience_level"])
        .salary_in_usd.median()
        .reset_index()
        .rename(columns={"salary_in_usd": "median_salary"})
    )

    top_ms = ms.groupby("job_title").median_salary.sum().nlargest(15)

    no_1_ms = ms.query("job_title == @top_ms.head(1).index.values[0]")

    ms10 = ms.set_index("job_title").loc[top_ms.index.to_list()].reset_index()

    fig = px.bar(
        ms10,
        x="job_title",
        y="median_salary",
        color="experience_level",
        labels={"median_salary": "Median Salary", "job_title": "Job Title"},
    )

    return no_1_ms, fig


st.sidebar.write("""Filter Years:""")
```
<br>
    
**Code for Displaying the plots:**
```python
work_years = []

if st.sidebar.checkbox(label="2020", value=True):
    work_years.append(2020)

if st.sidebar.checkbox(label="2021", value=True):
    work_years.append(2021)

if st.sidebar.checkbox(label="2022", value=True):
    work_years.append(2022)


no_1_jt, fig_jt = bar_job_title(df=st.session_state.df, work_years=work_years)
no_1_ms, fig_ms = bar_sal_job_title(df=st.session_state.df, work_years=work_years)

tab1, tab2 = st.tabs(["No. of Employees", "Median Salary"])

with tab1:
    st.markdown(
        f"""\
    # No. of Employees by Job Title
    ##### Years: [{', '.join(str(y) for y in work_years)}]
    ---\
    """
    )

    if no_1_jt is not None:
        left, right = st.columns(2)
        left.metric(label="Top Job Title", value=no_1_jt.job_title.iloc[0])
        right.metric(label="Total No.of Employees", value=no_1_jt.no_of_empls.sum())

    if fig_jt is not None:
        st.plotly_chart(fig_jt)


with tab2:
    st.markdown(
        f"""\
        # Median Salary by Job Title
        ##### Years: [{', '.join(str(y) for y in work_years)}]
        ---\

        """
    )
    if no_1_ms is not None:
        left, right = st.columns([2, 1])
        left.metric(label="Highest Paying Job Title", value=no_1_ms.job_title.iloc[0])
        right.metric(
            label="Median Salary",
            value=f"{no_1_ms.median_salary.sum(): ,} USD",
        )

    if fig_ms is not None:
        st.plotly_chart(fig_ms)


```

<div style="page-break-after: always;"></div>

<i>**pages/2_🌎_Plot_By_Country**</i>
<font size=2>

```python
import country_converter as coco
import pandas as pd
import plotly.express as px
import streamlit as st

from utils.loader import load_data

if "df" not in st.session_state:
    st.session_state.df = st.cache(load_data)()

```
<br>
    
**Code for Plot-3:**
```python
def chloropleth_empl_country(
    df: pd.DataFrame, work_years: list[int], experience_levels: list[str]
):

    cn = (
        df.query("work_year in @work_years and experience_level in @experience_levels")
        .employee_residence.value_counts()
        .to_frame()
        .reset_index()
        .rename(columns={"index": "country", "employee_residence": "no_of_empls"})
    )

    fig = px.choropleth(
        cn,
        locations=coco.convert(names=cn.country, to="ISO3"),
        color="no_of_empls",
        range_color=(0, cn.no_of_empls.quantile(0.98)),
        hover_name="country",
        labels={
            "no_of_empls": "No. of Employees",
        },
    )

    return fig

```
<br>
    
**Code for Plot-4:**
```python
def chloropleth_sal_country(
    df: pd.DataFrame, work_years: list[int], experience_levels: list[str]
):

    sal_by_country = (
        df.query("work_year in @work_years and experience_level in @experience_levels")
        .groupby(["salary_in_usd", "company_location"])
        .size()
        .reset_index()
        .groupby("company_location")
        .median()
        .reset_index()
    )

    fig = px.choropleth(
        locations=coco.convert(names=sal_by_country.company_location, to="ISO3"),
        color=sal_by_country.salary_in_usd,
    )

    return fig

```
<br>
    
**Code for Code for Displaying the plots:**
```python
st.sidebar.write("Filter Years: ")

work_years = []

work_years.extend(
    list(
        range(
            2020, (st.sidebar.slider(label="Year", min_value=2020, max_value=2022) + 1)
        )
    )
)

st.sidebar.write("Filter Experience Levels: ")
experience_levels = []

if st.sidebar.checkbox(label="Junior", value=True):
    experience_levels.append("Junior")

if st.sidebar.checkbox(label="Intermediate", value=True):
    experience_levels.append("Intermediate")

if st.sidebar.checkbox(label="Expert", value=True):
    experience_levels.append("Expert")

if st.sidebar.checkbox(label="Director", value=True):
    experience_levels.append("Director")


fig = chloropleth_empl_country(
    st.session_state.df, work_years=work_years, experience_levels=experience_levels
)

fig_2 = chloropleth_sal_country(
    st.session_state.df, work_years=work_years, experience_levels=experience_levels
)

tab1, tab2 = st.tabs(["No. of Employees", "Median Salary"])

with tab1:
    st.markdown(
        f"""
# No. of Employees by Country
---
##### Years: [{', '.join(str(y) for y in work_years)}]
##### Exp. Levels: [{', '.join(experience_levels)}]
"""
    )

    st.plotly_chart(fig)


with tab2:
    st.markdown(
        f"""
# Median Salary by Country
---
##### Years: [{', '.join(str(y) for y in work_years)}]
##### Exp. Levels: [{', '.join(experience_levels)}]
"""
    )

    st.plotly_chart(fig_2)


```

<div style="page-break-after: always;"></div>

<i>**pages/3_📊_Distribution_By_Year**</i>
<font size=2>

```python
import pandas as pd
import plotly.express as px
import streamlit as st

from utils.loader import load_data

if "df" not in st.session_state:
    st.session_state.df = st.cache(load_data)()

```
<br>
    
**Code for Plot-5:**
```python
def pie_exp_lvl(df: pd.DataFrame, work_years: list[int]):

    exp_lvl = (
        df.query("work_year in @work_years")
        .experience_level.value_counts()
        .sort_index()
    )
    fig = px.pie(names=exp_lvl.index, values=exp_lvl.values, color=exp_lvl.index)
    fig.update_traces(
        textinfo="label+percent+value",
    )

    return fig

```
<br>
    
**Code for Plot-6:**
```python
def pie_comp_size(df: pd.DataFrame, work_years: list[int]):

    comp_size = (
        df.query("work_year in @work_years").company_size.value_counts().sort_index()
    )
    fig = px.pie(
        names=comp_size.index,
        values=comp_size.values,
        color=comp_size.index,
    )
    fig.update_traces(
        textinfo="label+percent+value",
    )

    return fig

```
<br>
    
**Code for Plot-7:**
```python
def pie_empl_type(df: pd.DataFrame, work_years: list[int]):

    empl_type = (
        df.query("work_year in @work_years").employment_type.value_counts().sort_index()
    )
    fig = px.pie(
        names=empl_type.index,
        values=empl_type.values,
        color=empl_type.index,
    )
    fig.update_traces(
        textinfo="label+percent+value",
    )

    return fig

```
<br>
    
**Code for Displaying the plots:**
```python
work_years = []

work_years.extend(
    list(
        range(
            2020, (st.sidebar.slider(label="Year", min_value=2020, max_value=2022) + 1)
        )
    )
)

fig1 = pie_exp_lvl(df=st.session_state.df, work_years=work_years)
fig2 = pie_comp_size(df=st.session_state.df, work_years=work_years)
fig3 = pie_empl_type(df=st.session_state.df, work_years=work_years)

tab1, tab2, tab3 = st.tabs(["Experience Level", "Company Size", "Employment Type"])

with tab1:
    st.markdown(
        f"""
# Distribution of Experience Level
##### Years: [{', '.join(str(y) for y in work_years)}]
---\
"""
    )

    st.plotly_chart(fig1, use_container_width=True)


with tab2:
    st.markdown(
        f"""
# Distribution of Company Size
##### Years: [{', '.join(str(y) for y in work_years)}]
---\
"""
    )
    st.plotly_chart(fig2, use_container_width=True)


with tab3:
    st.markdown(
        f"""
# Distribution of Employment Type
##### Years: [{', '.join(str(y) for y in work_years)}]
---\
"""
    )
    st.plotly_chart(fig3, use_container_width=True)



```

<div style="page-break-after: always;"></div>

<i>**pages/4_📈_Salary_Distribution**</i>
<font size=2>

```python
import pandas as pd
import plotly.figure_factory as ff
import streamlit as st

from utils.loader import load_data

if "df" not in st.session_state:
    st.session_state.df = st.cache(load_data)()

```
<br>
    
**Code for Plot-8:**
```python
def dist_sal_by_work_year(df):

    y2020 = df.query("work_year == 2020")
    y2021 = df.query("work_year == 2021")
    y2022 = df.query("work_year == 2022")

    hist_data = [y2020.salary_in_usd, y2021.salary_in_usd, y2022.salary_in_usd]
    group_labels = ["2020", "2021", "2022"]

    fig = ff.create_distplot(hist_data, group_labels, show_hist=False)

    return fig

```
<br>
    
**Code for Plot-9:**
```python
def dist_sal_by_exp_level(df: pd.DataFrame):
    exp_level_sal = df[["experience_level", "salary_in_usd"]]

    entry_salary = exp_level_sal.query("experience_level == 'Junior'")
    executive_salary = exp_level_sal.query("experience_level == 'Director'")
    mid_salary = exp_level_sal.query("experience_level == 'Intermediate'")
    senior_salary = exp_level_sal.query("experience_level == 'Expert'")

    hist_data = [
        entry_salary.salary_in_usd,
        mid_salary.salary_in_usd,
        senior_salary.salary_in_usd,
        executive_salary.salary_in_usd,
    ]
    group_labels = ["Junior", "Intermediate", "Expert", "Director"]

    fig = ff.create_distplot(
        hist_data,
        group_labels,
        show_hist=False,
    )

    return fig

```
<br>
    
**Code for Plot-10:**
```python
def dist_sal_by_company_size(df):
    exp_level_sal = df[["experience_level", "salary_in_usd"]]

    c_size = df[["company_size", "salary_in_usd"]]
    small = exp_level_sal.loc[c_size["company_size"] == "Small"]
    medium = exp_level_sal.loc[c_size["company_size"] == "Medium"]
    large = exp_level_sal.loc[c_size["company_size"] == "Large"]

    hist_data = [
        small["salary_in_usd"],
        medium["salary_in_usd"],
        large["salary_in_usd"],
    ]
    group_labels = ["Small", "Mid", "Large"]

    fig = ff.create_distplot(hist_data, group_labels, show_hist=False)

    return fig

```
<br>
    
**Code for Displaying the plots:**
```python
fig1 = dist_sal_by_work_year(st.session_state.df)
fig2 = dist_sal_by_exp_level(st.session_state.df)
fig3 = dist_sal_by_company_size(st.session_state.df)


tab1, tab2, tab3 = st.tabs(["Work Year", "Experience Level", "Company Size"])

with tab1:
    st.markdown(
        f"""
# Distribution of Salary by Work Year

"""
    )

    st.plotly_chart(fig1, use_container_width=True)


with tab2:
    st.markdown(
        f"""
# Distribution of Salary by Experience Level
"""
    )
    st.plotly_chart(fig2)


with tab3:
    st.markdown(
        f"""
# Distribution of Salary by Company Size
"""
    )
    st.plotly_chart(fig3)


```

<div style="page-break-after: always;"></div>

<i>**pages/5_📊_Salary_Box_Plot**</i>
<font size=2>

```python
import pandas as pd
import plotly.express as px
import streamlit as st

from utils.loader import load_data

if "df" not in st.session_state:
    st.session_state.df = st.cache(load_data)()

```
<br>
    
**Code for Plot-11:**
```python
def box_sal_exp(df: pd.DataFrame, work_years: list[int]):

    df_sub = df.query("work_year in @work_years").sort_values(by="experience_level")

    fig = px.box(
        df_sub,
        x="experience_level",
        y="salary_in_usd",
        color="experience_level",
        labels={
            "experience_level": "Experience Level",
            "salary_in_usd": "Salary in USD",
        },
    )

    return fig

```
<br>
    
**Code for Plot-12:**
```python
def box_sal_comp_size(df: pd.DataFrame, work_years: list[int]):

    df_sub = df.query("work_year in @work_years").sort_values(by="company_size")

    fig = px.box(
        df_sub,
        x="company_size",
        y="salary_in_usd",
        color="company_size",
        labels={"company_size": "Company Size", "salary_in_usd": "Salary in USD"},
    )

    return fig

```
<br>
    
**Code for Plot-13:**
```python
def box_sal_empl_type(df: pd.DataFrame, work_years: list[int]):

    df_sub = df.query("work_year in @work_years").sort_values(by="employment_type")
    fig = px.box(
        df_sub,
        x="employment_type",
        y="salary_in_usd",
        color="employment_type",
        labels={"employment_type": "Employment Type", "salary_in_usd": "Salary in USD"},
    )

    return fig

```
<br>
    
**Code for Displaying the plots:**
```python
work_years = []

work_years.extend(
    list(
        range(
            2020, (st.sidebar.slider(label="Year", min_value=2020, max_value=2022) + 1)
        )
    )
)


fig1 = box_sal_exp(st.session_state.df, work_years)
fig2 = box_sal_comp_size(st.session_state.df, work_years)
fig3 = box_sal_empl_type(st.session_state.df, work_years)


tab1, tab2, tab3 = st.tabs(["Experience Level", "Company Size", "Employment Type"])

with tab1:
    st.markdown(
        f"""
# Characteristics of Salary by Experience Level
##### Years: [{', '.join(str(y) for y in work_years)}]
---\
"""
    )

    st.plotly_chart(fig1, use_container_width=True)


with tab2:
    st.markdown(
        f"""
# Characteristics of Salary by Company Size
##### Years: [{', '.join(str(y) for y in work_years)}]
---\
"""
    )
    st.plotly_chart(fig2)


with tab3:
    st.markdown(
        f"""
# Characteristics of Salary by Employment Type
##### Years: [{', '.join(str(y) for y in work_years)}]
---\
"""
    )
    st.plotly_chart(fig3)

```

<div style="page-break-after: always;"></div>
