# <span style = 'color: #BC8F8F'>WEATHER DATA ANALYSIS AND STATISTICS, MODELINGS</span>

## <span style = 'color : #FFFACD'>Part 1: Content</span>
<p style="text-indent:1cm;">This project is about getting weather data from .csv files and doing basic statistics and graphing.</p>

## <span style = 'color : #FFFACD'>Part 2: Steps Implementation</span>
<p style="text-indent:1cm;">Step 1: Load weather data.</p>
<p style="text-indent:1cm;">Step 2: Preprocess data (convert date format, remove missing values).</p>
<p style="text-indent:1cm;">Step 3: Calculate basic statistics (average, maximum, minimum).</p>
<p style="text-indent:1cm;">Step 4: Draw a graph of data over time.</p>

## <span style = 'color : #FFFACD'>Part 3: Introduce</span>
### <span style='color : #D2B48C'><i>List Of Folders</i></span>
<span>
    <p style="text-indent:1cm;">1. Data Folder: Store file Data.csv, it stores all properties include date, temperature, humidity, precipitation, speed of wind, pressure.</p>
    <p style="text-indent:1cm;">2. Average Folder: Store all graphs illustrate average result by month of all below properties by date.</p>
    <p style="text-indent:1cm;">3. Min_Max Folder: Store all graphs illustrate min and max result by month of all below properties by date.</p>
    <p style="text-indent:1cm;">4. Report Folder: Store only file Report.txt, it store all statistics results of the given data, it calculate the min, max, mean of each properties.</p>
    <p style="text-indent:1cm;">5. Introduce Folder: Store all images with using to illustrate in this jupyter notebook to help person using the easily.</p>

### <span style='color : #D2B48C'><i>File Data.csv Format</i></span>
![CSVFileFormat.png](attachment:CSVFileFormat.png)

### <span style='color : #D2B48C'><i>Example For Result Graph</i></span>
<table>
  <tr>
    <td><img src="./Introduce/Average.png"></td>
    <td><img src="./Introduce/Min_Max.png"></td>
  </tr>
</table>

### <span style='color : #D2B48C'><i>Example For File Report</i></span>
```plaintext
    Report
    
    Basic statistics:
        Highest Temperature: 40.79 on 2023-05-04 00:00:00
        Lowest Temperature: -0.27 on 2023-10-01 00:00:00
        Highest Humidity: 99.6 on 2023-04-27 00:00:00
        Lowest Humidity: 10.64 on 2023-09-16 00:00:00
        Highest Precipitation: 24.72
        Lowest Precipitation: 0.02
        Highest Wind Speed: 10.73
        Lowest Wind Speed: 0.46
        Highest Pressure: 1038.42
        Lowest Pressure: 986.7

    Average Values:
        Temperature: 19.670109589041097
        Humidity: 59.683863013698634
        Precipitation: 8.009643835616439
        Wind Speed: 5.197808219178082
        Pressure: 1012.544410958904

    Monthly Averages:
                         Date  Temperature   Humidity  Precipitation  Wind_Speed  \
    Month                                                                          
    1     2023-01-16 00:00:00    23.179355  66.561290       8.891613    4.446452   
    2     2023-02-14 12:00:00    26.980000  71.157500       7.511429    5.724286   
    3     2023-03-16 00:00:00    28.790968  79.144839       8.094194    5.388387   
    4     2023-04-15 12:00:00    29.635333  78.974667       8.750333    4.790000   
    5     2023-05-16 00:00:00    26.064839  74.989032       8.208387    4.734516   
    6     2023-06-15 12:00:00    21.892333  64.478333       8.403333    5.355667   
    7     2023-07-16 00:00:00    18.072258  53.906452       8.134516    5.278065   
    8     2023-08-16 00:00:00    13.799032  44.214516       7.124516    5.448065   
    9     2023-09-15 12:00:00     9.185333  39.481667       9.090667    5.559667   
    10    2023-10-16 00:00:00    10.594516  40.958065       7.334194    5.387097   
    11    2023-11-15 12:00:00    12.934667  45.894667       7.828333    5.500333   
    12    2023-12-16 00:00:00    15.457742  57.236129       6.761613    4.825484   

              Pressure YearMonth  
    Month                         
    1      1010.300968   2023-01  
    2      1014.073214   2023-02  
    3      1012.422581   2023-03  
    4      1016.137667   2023-04  
    5      1010.601613   2023-05  
    6      1012.754667   2023-06  
    7      1013.956129   2023-07  
    8      1011.356129   2023-08  
    9      1011.625333   2023-09  
    10     1011.948387   2023-10  
    11     1011.681333   2023-11  
    12     1013.888065   2023-12  
```

### <span style='color : #D2B48C'><i>Note</i></span>
<p style="text-indent:1cm;">1. User must install all module before run this jupyter, if you forget to install, this code can't be run, and the program will announce to you.</p>
<p style="text-indent:1cm;">2. User can use the following command and run it in the terminal:</p>
<div style="text-align: center;">
  <p style="margin: 0; padding: 0;"><b>pip install module_name</b></p>
</div>
<p style="text-indent:1cm;">3. For example, if you want to install module pandas, use can do this:</p>
<div style="text-align: center;">
  <p style="margin: 0; padding: 0;"><b>pip install pandas</b></p>
</div>
<p style="text-indent:1cm;">4. References: <a href="https://docs.python.org/3/installing/index.html"><u>Click Here</u></a></p>
<p style="text-indent:1cm;">5. If you do not run this jupyter notebook (or you do't have IDE), you can run it in <a href="https://colab.research.google.com/"><u>Google Colab</u></a>.</p>

## <span style = 'color : #FFFACD'>Part 4: Modules Importing</span>

In [21]:
import pandas as pd
import matplotlib.pyplot as plt
import os
import logging
from typing import Union

## <span style = 'color : #FFFACD'>Part 5: Helper Functions Implementation</span>

In [22]:
def normalize_path(file_path : str) -> str:
    """
    Normalize a file path by replacing backslashes with forward slashes and removing double quotes.

    Args:
        file_path (str): The original file path.

    Returns:
        str: The normalized file path.
    """
    new_file_path = file_path.replace('"', '').replace('\\', '/')
    return new_file_path


def convert_datetime_format(weather):
    """
    Converts the 'Date' column in the weather dataframe to datetime format,
    and adds a new column 'YearMonth' representing the year and month period.
    
    Args:
        Weather (DataFrame): A pandas DataFrame containing a 'Date' column.
    
    Returns:
        Weather (DataFrame): The input dataframe with 'Date' converted to datetime, and 'YearMonth' column added.
    """
    weather['YearMonth'] = pd.to_datetime(weather['Date']).dt.to_period('M')
    return weather
    

def show_chart(x_label : str, y_label : str, x_data, y_data, title='Bar Chart', color='red', figsize=(15, 3)) -> None:
    """
    Displays a bar chart using the provided data and labels for the X and Y axes.

    Args:
        x_label (str): Label for the X-axis.
        y_label (str): Label for the Y-axis.
        x_data (list): Data for the X-axis.
        y_data (list): Data for the Y-axis.
        title (str, optional): Title of the chart. Default is 'Bar Chart'.
        color (str, optional): Color of the bars. Default is 'red'.
        figsize (tuple, optional): Size of the figure. Default is (15, 3).

    Returns:
        None: Displays the chart.
    """
    plt.figure(figsize=figsize)
    plt.bar(x_data, y_data, color=color)
    plt.xlabel(x_label)
    plt.ylabel(y_label)
    plt.title(title)
    plt.grid(True)
    plt.show()


def save_chart_1(x_label : str, y_label : str, x_data, y_data, chart_title : str, des_path : str, figsize=(15,10)) -> None:
    """
    Create a line plot using Matplotlib and save it as a .png file.

    Args:
        x_label (str): Label for the x-axis.
        y_label (str): Label for the y-axis.
        x_data: Data points for the x-axis.
        y_data: Data points for the y-axis.
        chart_title (str): Title of the chart.
        des_path (str): Destination path to save the .png file.

    Returns:
        None
    """
    plt.figure(figsize=figsize)
    plt.plot(x_data, y_data)
    plt.xlabel(x_label)
    plt.ylabel(y_label)
    plt.title(chart_title)
    plt.savefig(normalize_path(des_path))
    plt.close()

## <span style = 'color : #FFFACD'>Part 6: Important Functions Implementation</span>

In [23]:
def read_csv(csv_path : str) -> Union[pd.DataFrame, None]:
    """
    Read data from a CSV file using pandas.

    Args:
        csv_path (str): The file path to the CSV file.

    Returns:
        pd.DataFrame or None: The DataFrame containing the CSV data, or None if the file cannot be read.
    """
    try:
        weather_data = pd.read_csv(csv_path)
        return weather_data
    except Exception as e:
        print(f"Error reading CSV file '{csv_path}': {e}")
        return None


def save_monthly_average(weather, label):
    """
    Displays a bar chart of the average monthly data for a specific label.

    Args:
        weather_weather (DataFrame): DataFrame containing 'YearMonth' and 'label' columns.
        label (str): The column name in weather_weather for which average monthly data is to be plotted.

    Returns:
        None: Displays the chart.
    """
    monthly_avg = weather.groupby('YearMonth')[label].mean().reset_index()
    monthly_avg['YearMonth'] = monthly_avg['YearMonth'].astype(str)
    save_chart_1('Time', f'Average {label}', monthly_avg['YearMonth'], monthly_avg[label], f'Average Monthly {label.capitalize()}', f'Average/Average_Monthly_{label.capitalize()}.png')


def save_monthly_min_max(weather, label):
    """
    Calculate monthly minimum and maximum temperature and save them on the same bar chart.

    Args:
        weather (pd.DataFrame): DataFrame containing weather data.
        label (str): Label for the plot.

    Returns:
        None
    """
    monthly_min, monthly_max = weather.groupby('YearMonth')[label].min().reset_index(), weather.groupby('YearMonth')[label].max().reset_index()
    monthly_min['YearMonth'] = monthly_min['YearMonth'].astype(str)
    plt.figure(figsize=(15,10))
    plt.plot(monthly_min['YearMonth'], monthly_min[label], label='Min')
    plt.plot(monthly_min['YearMonth'], monthly_max[label], label='Max')
    plt.xlabel('Time')
    plt.ylabel('Monthly')
    plt.title(f'Min - Max Monthly {label.capitalize()}')
    plt.legend()
    plt.savefig(f'Min_Max/Min_Max_Monthly_{label.capitalize()}.png')
    plt.close()


def create_report(weather) -> str:
    """
    Generate a weather report based on the provided weather data.

    Args:
        weather (DataFrame): DataFrame containing weather data with columns ['Date', 'Temperature', 'Humidity', 'Precipitation', 'Wind_Speed', 'Pressure']. The 'Date' column should be in datetime format.

    Returns:
        report (str): Formatted string containing various statistics and averages of the weather data.
    """
    weather['Date'] = pd.to_datetime(weather['Date'])
    weather['Month'] = weather['Date'].dt.month

    report = f"""Report
    
Basic statistics:
    Highest Temperature: {weather['Temperature'].max()} on {weather[weather['Temperature'] == weather['Temperature'].max()]['Date'].iloc[0]}
    Lowest Temperature: {weather['Temperature'].min()} on {weather[weather['Temperature'] == weather['Temperature'].min()]['Date'].iloc[0]}
    Highest Humidity: {weather['Humidity'].max()} on {weather[weather['Humidity'] == weather['Humidity'].max()]['Date'].iloc[0]}
    Lowest Humidity: {weather['Humidity'].min()} on {weather[weather['Humidity'] == weather['Humidity'].min()]['Date'].iloc[0]}
    Highest Precipitation: {weather['Precipitation'].max()}
    Lowest Precipitation: {weather['Precipitation'].min()}
    Highest Wind Speed: {weather['Wind_Speed'].max()}
    Lowest Wind Speed: {weather['Wind_Speed'].min()}
    Highest Pressure: {weather['Pressure'].max()}
    Lowest Pressure: {weather['Pressure'].min()}

Average Values:
    Temperature: {weather['Temperature'].mean()}
    Humidity: {weather['Humidity'].mean()}
    Precipitation: {weather['Precipitation'].mean()}
    Wind Speed: {weather['Wind_Speed'].mean()}
    Pressure: {weather['Pressure'].mean()}

Monthly Averages:
{weather.groupby('Month').mean()}
"""

    return report

def save_report(report_content: str, path: str) -> None:
    """
    Save the given report content to a file at the specified path.

    Parameters:
        report_content (str): The content of the report to be saved.
        path (str): The file path where the report will be saved.

    Returns:
        None

    Logs:
        Logs success message with path when report is saved successfully.
        Logs error message with details when there is an exception while saving the report.
    """
    try:
        os.makedirs(os.path.dirname(path), exist_ok=True)
        with open(path, 'w', encoding='utf-8') as file:
            file.write(report_content)
        logging.info(f"Report successfully saved to {path}")
    except Exception as e:
        logging.error(f"Failed to save report to {path}: {e}")

## <span style = 'color : #FFFACD'>Part 7: Main Function Implementation</span>

In [24]:
def main():
    weather = read_csv(r"C:\Users\NhanPham\Desktop\NewProject\Data\Data.csv")
    
    convert_datetime_format(weather)

    [save_monthly_average(weather, item) for item in weather.columns.to_list()[1:-1]];

    [save_monthly_min_max(weather, item) for item in weather.columns.to_list()[1:-1]];

    report = create_report(weather)

    save_report(report, 'Report/Report.txt')

## <span style = 'color : #FFFACD'>Part 8: Call Main</span>

In [25]:
if __name__ == '__main__':
    main()

## <span style = 'color : #FFFACD'>Part 9: References</span>
<p style='text-indent:1cm'>Python: <a href="https://www.python.org/"><u>View Python</u></a></p>
<p style='text-indent:1cm'>Jupyter notebook: <a href="https://jupyter.org/"><u>View Jupyter notebook</u></a></p>
<p style='text-indent:1cm'>Module pandas: <a href="https://pandas.pydata.org"><u>View Pandas</u></a></p>
<p style='text-indent:1cm'>Module matplotlib: <a href="https://matplotlib.org/"><u>View Matplotlib</u></a></p>
<p style='text-indent:1cm'>Module os: <a href="https://docs.python.org/3/library/os.html"><u>View OS</u></a></p>