Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeVariant: Code Optimization #51

Open
Tanvi-Jain01 opened this issue Jul 9, 2023 · 0 comments
Open

TimeVariant: Code Optimization #51

Tanvi-Jain01 opened this issue Jul 9, 2023 · 0 comments

Comments

@Tanvi-Jain01
Copy link

Tanvi-Jain01 commented Jul 9, 2023

@nipunbatra , @patel-zeel

Description:

I went through the source code of timeVariantplot where I think that the code can be generalized and can be optimized to work better with any pollutant.

Source Code:

Code Optimization 1

pollutant = ["pm10", "no2", "pm25", "so2"]

The above list make is particular to the specified pollutants only, so it would be better if we take the list of pollutants from the user.

Proposed solution:
Hence the method signature should look like below:

def timeVariation(df, pollutant=['pm25']):

Code Optimization 2

df_day_m = df_day.groupby("hour").mean()
df_day_m = df_day_m.reset_index()

The above line can be optimized like:

Proposed solution:

df_day_m = df_day.groupby("hour").mean().reset_index()

Code Optimization 3

df_hour = df

df_month = df

The above initialization should be done at once in the beginning like:
Proposed solution:

    df_days=df
    df_hour=df
    df_month=df
    df_weekday = df

Improved Code:

def timeVariation(df, pollutant=['pm25']):
    """ 
    Plots four plots:
    - The average pollutant level per day by 
    each hour for each day of the week across all of the data
    - The average pollutant level by each hour, 
    across all data
    - The average pollutant level by each month of the
    year for across data
    - The average pollutant level per day of the week 
    across all data
    
    Parameters
    ----------
    df: pandas.DataFrame
      data frame of hourly data. 
      Must include a date field and at least one variable to plot
    pollutant: str as list
      Name of pollutants to plot

    """
    import datetime as dt
    import matplotlib.pyplot as plt
    import matplotlib as mpl
    import numpy as np
    import pandas as pd
    from numpy import array
    
    #print(pollutants)
    df["date"] = pd.to_datetime(df.date)

    df_days=df
    df_hour=df
    df_month=df
    df_weekday = df
    
    df_days["day"] = df_days["date"].dt.day_name()
    df_days = df_days.set_index(keys=["day"])
    df_days = df_days.groupby(["day"])

    dayWeek = [
        "Monday",
        "Tuesday",
        "Wednesday",
        "Thursday",
        "Friday",
        "Saturday",
        "Sunday",
    ]

    for i in range(len(dayWeek)):
        plt.figure(1, figsize=(40, 5))
        plt.subplot(1, 7, i + 1)
        plt.grid()

        df_day = df_days.get_group(dayWeek[i])
        df_day["hour"] = df_day["date"].dt.hour

        df_day_m = df_day.groupby("hour").mean().reset_index()
        df_day_s = df_day.groupby("hour").std().reset_index()
       

        for k in range(len(pollutant)):
            plt.plot(df_day_m["hour"], df_day_m[pollutant[k]], label=pollutant[k])
            plt.fill_between(
                df_day_s["hour"],
                df_day_m[pollutant[k]] - 0.5 * df_day_s[pollutant[k]],
                df_day_m[pollutant[k]] + 0.5 * df_day_s[pollutant[k]],
                alpha=0.2,
            )
            plt.xlabel(dayWeek[i])
            plt.legend()
    plt.savefig("TimeVariationPlots1.png", bbox_inches="tight")

    plt.figure(2, figsize=(35, 5))
    plt.subplot(1, 3, 1)
    plt.grid()

    df_hour["hour"] = df_hour["date"].dt.hour
    df_hour_m = df.groupby("hour").mean().reset_index()
    df_hour_s = df.groupby("hour").std().reset_index()
    
    for i in range(len(pollutant)):
        plt.plot(df_hour_m["hour"], df_hour_m[pollutant[i]], label=pollutant[i])
        plt.fill_between(
            df_hour_s["hour"],
            df_hour_m[pollutant[i]] - 0.5 * df_hour_s[pollutant[i]],
            df_hour_m[pollutant[i]] + 0.5 * df_hour_s[pollutant[i]],
            alpha=0.2,
        )
        plt.xlabel("Hour")
        plt.legend()

    plt.subplot(1, 3, 2)
    plt.grid()

    df_month["month"] = df_month["date"].dt.month
    df_month_m = df_month.groupby("month").mean().reset_index()
    df_month_s = df_month.groupby("month").std().reset_index()
    
    for i in range(len(pollutant)):
        plt.plot(df_month_m["month"], df_month_m[pollutant[i]], label=pollutant[i])
        plt.fill_between(
            df_month_s["month"],
            df_month_m[pollutant[i]] - 0.5 * df_month_s[pollutant[i]],
            df_month_m[pollutant[i]] + 0.5 * df_month_s[pollutant[i]],
            alpha=0.2,
        )
        plt.xlabel("Month")
        plt.legend()

    plt.subplot(1, 3, 3)
    plt.grid()
    
    df_weekday["weekday"] = df_weekday["date"].dt.weekday
    df_weekday_m = df_weekday.groupby("weekday").mean().reset_index()
    df_weekday_s = df_weekday.groupby("weekday").std().reset_index()

    for i in range(len(pollutant)):
        plt.plot(
            df_weekday_m["weekday"], df_weekday_m[pollutant[i]], label=pollutant[i]
        )
        plt.fill_between(
            df_weekday_s["weekday"],
            df_weekday_m[pollutant[i]] - 0.5 * df_weekday_s[pollutant[i]],
            df_weekday_m[pollutant[i]] + 0.5 * df_weekday_s[pollutant[i]],
            alpha=0.2,
        )
        plt.xlabel("WeekDay")
        plt.legend()
    plt.savefig("TimeVariationPlots2.png", bbox_inches="tight")
    print("Your plots has also been saved")
    plt.show()
    

timeVariation(df1, pollutant=['pm25','nh3'])

# =============================================================================
# df = pd.read_csv("mydata.csv")
# timeVariation(df,['pm10'])
# =============================================================================

NOTE: I'm also adding plt.savefig("TimeVariationPlots2.png", bbox_inches="tight") to save the figure.

Output:

TimeVariationPlots1
TimeVariationPlots2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant