# Identifying and Defining

### Choose your data scenario and define your purpose
**Data:** I'm analysing the weather data of Australia in the span of 10 years (2007-2017)\
**Goal:** What are the weather patterns and how does it change over a long period of time\
**Source:** https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package\
**Access:** Data is publicly available\
**Access Method:** Accessible through .csv




# Functional Requirements

**Data Loading:** \
Description: The program will be able to load a .csv file.\
Input: A .csv file\
Output: A processed and loaded .csv file\
**Data Cleaning:** \
Description: The program cleans data by removing any unnecesssary information\
Input: A processed .csv file\
Output: Clean and usable pandas dataframe\
**Data Analysis:** \
Description: The system needs to identify the trend in data\
Input: A cleaned .csv file\
Output: Shows the trend in a piece of data using the most important information\
**Data Visualisation:** \
Description: The data will be visualised using a matplotlib chart\
Input: A .csv file that displays a trend\
Output: A matplotlib chart showing information\
**Data Reporting:** \
Description: The matplotlib chart will be stored as a png file with the given data\
Input: A matplotlib chart along with the information\
Output: A matplotlib chart stored as a png



# Use Cases

**Data Loading**\
Actor: User\
Goal: To load a dataset into the system.\
Preconditions: User has a dataset file ready.\
Main Flow:
1. User places the dataset for reading into the correct folder.
2. System validates the file format.
3. System loads the dataset and displays the information in a dataframe.\
Postconditions: Data is loaded and ready for analysis

**Data Cleaning**\
Actor: User\
Goal: To clean data and remove any errors.\
Preconditions: The data file has been loaded.\
Main Flow:
1. User places the dataset for scanning errors.
2. System removes any incorrect information.
3. System loads the cleaned data in a dataframe for further analysis.\
Postconditions: Data is cleaned and can now be investigated

    **Data Analysis**\
    Actor: User\
    Goal: To look for a trend in clean data.\
    Preconditions: The data file has been processed and cleaned.\
    Main Flow:
    1. User places the dataset for identifying trends.
    2. System removes any unnecessary information.
    3. System loads the analysed data in a dataframe for visualisation.\
    Postconditions: The data trend has been found and is ready to be visualised

    **Data Visualisation**\
    Actor: User\
    Goal: To visualise data using a matplotlib chart.\
    Preconditions: The data has an identified trend.\
    Main Flow:
    1. User places the dataset for finding trends.
    2. System removes any unnecessary information.
    3. System displays a matplotlib chart with trends.\
    Postconditions: Data is visualised as a matplotlib chart and can be saved

    **Data Reporting**\
    Actor: User\
    Goal: To save the matplotlib chart as a png.\
    Preconditions: A matplotlib chart is ready to be saved.\
    Main Flow:
    1. User places the data to save as a png and .csv file.
    2. System displays a matplotlib chart as a png.
    3. System displays the information as a .csv file.\
    Postconditions: A .csv file and a png file displays the data. 

# Non-Functional Requirements

**Usability**: The requirements from the user interface needs to make sure it is able to handle errors and stays consistent, as well as being overall simple to use. The readme should make a clear guide for how to manage the software using key aspects such as a title, description and other following content.

**Reliability**: The requirements of the system needs to make sure that data is free of errors and any incorrect information. When it does encounter an error, the user needs to be aware of the issue and it should be removed.

# Research and Planning

### **Research of Chosen Issue**

**Purpose:** The purpose of this data is to investigate and analyse the weather patterns in Australia and how it changes over a couple of years. This way, we can predict many of the upcoming events that will come in individual cities. 

**Missing Data:** This data contains information based on the last 10 years, which provides a reliable overview of the average temperatures and it's increase as we may not have such information over a vast period of time. 

**Stakeholders:** Regular people may find this data useful as it provides a detailed insight of the daily weather across many common cities around australia, but weather patterns may have changed since this dataset was made and there may be some inaccuracies. However, this will benefit meteorologists as they can break down this data to find small trends over the whole country which could be the cause for weather events today. 

**Use:** This data will be useful for people that professionally study weather by allowing them to compare the weather patterns from now and the dataset, meaning they could predict the changes that have caused certain changes in weather and use that information to predict the weather in the future.


### **Privacy and Security**

**Data Privacy of Source:** I am sourcing my data from Kaggle.com, which is a public data source that everyone can use. Since this dataset is about the weather around Australia, it isn't a necessity to protect any data from other people. 

**Application Data Privacy:** There aren't many responsibilities in maintaining this data as there isn't any identifiable data about any individual, other than minor details such as username and password as part of the login process. 

**Cyber Security:** To maintain cybersecurity, there are a variety of methods to make sure that the user is not impersonating anyone else. 
                    User authentication: Proving that someone is who they claim to be with passwords, PINs, Fingerprint etc.

# Data Dictionary

|Field|Datatype|Format or Display|Description|Example|Validation|
|:----:|:----:|:----:|:----:|:----:|:----:|
|Date|datetime64|YYYY-MM-DD|The date the data was recorded|23/07/2010|Must be in the format of YYYY-MM-DD, cannot be in reverse order|
|Location|object|XX...XX|The place the data was located|Penrith|Can be any amount of characters but must not include numbers|
|Rainfall|float64|N.NN|The amount of rainfall in a location|1.2|Must be a decimal number to 2 decimal places|
|Temp3pm|float64|NN.N|The temprature at 3pm in a location|30.1|Must be a decimal number to 1 decimal place|
|Humidity3pm|integer|NN|The humidity at 3pm in a location|20|Must be a whole number up to 2 digits|
|Windspeed3pm|integer|NN|The windspeed at 3pm in a location|18|Must be a whole number up to 2 digits|




# Producing and Implementing

### **Documenting your Testing**



# Testing and evaluating

### **Analyse and Conclude**

**Data Visualisation:** I provided 3 charts from my data analysis program. This contained rain data in Canberra across all years, rain data in 2007, and rain data in 2017. \
**Calculations:** The calculations in my data analysis process included removing a specific amount of rows and columns so I could get my desired data, as well as removing any missing values. This had turned out to be very accurate. 
**Accuracy:** The information that I have retrieved is pretty accurate, as it had came from daily weather observations from numerous weather stations around Australia.
**Conclusions:** From the data provided, we can compare the rain data over 10 years and relate it to any weather events that occured in the same period of time to discover how it impacts rainfall in Canberra. Observations can also be observed by locating the time there was the least rainfall, and the most. 


**Ansh:** Samraj's graphs are very sophisticated and his code is easy to understand and use. All the functions operate seamelssly and with no issues occouring. The way he singled out the oldest data and the newest data to show the patern and effect of climate change is ingenious.

**Yyoung:** Functional Requirement: Samraj's program is able to load a .csv file as described in the functional requirements. He was also able to remove any unnecessary infomation and outputed a cleaned .csv file. He was able to create a code that was able to visualize and output a matplotlib graph. 
Non-Functional Reqirements: His program was able to handle errors and stay consistent throughout testing. His system made sure that the data was free of errors and incorrect infomation. When it encountered a misinput, it did not crash and resolved the issue.

# Evaluation

My data analysis had fulfilled the goal to display datasets in a chart using a user interface system. The program was successfully able to load a .csv file, which was later cleaned and divided into smaller datasets for comparison and analysis. After producing a clean dataset, the system then visualises the data in a matplotlib file which displays all the information from the .csv file. However, the data in the matplotlib chart could have been more accurately displayed. 

The usability and reliability aspects need to make sure that the user interface is easy to navigate as well as flexible when encountering errors such as misinputs. When I tested the UI, I found that the system could handle these errors very well without breaking. This indicated that the user would have an easy time accessing the data, without any incorrect information. 

By the end of this task, I should have been able to load a .csv file containing data and load it as a chart after cleaning and analysing. After multiple attempts, I had achieved that goal and created an easy way to display the data. I conducted precise testing and debugging so I that I would be able to display 3 different charts containing different data. My original idea was to display the data for rain in Canberra in each individual year but I hadn't spent too much time working on that when I should've, so I simplified that idea and made two charts that would start from the earliest time and the last time the data was recorded, so I would still complete my intention of being able to compare two pieces of data from different years. 