# Phase 2: Researching and Planning

## Research Your Chosen Issue: Dive deep into your chosen data scenario. What's the problem or topic you're exploring? Document your findings in Jupyter notebook.
The purpose of this task is to find out what the weather will be for a particular place in Australia. This is important as it allows people to plan their day accordingly; bring a jumper if it's going to be cold, an umbrella if it's going to rain, etc. I feel it's necessary to carry out this analysis as it can provide another viewpoint to what the weather may be; so people can make a more informed guess by looking at this different algorithm which may be more or less accurate than the real weather prediction but then be able to compare it with the current predictions for maximum accuracy. Everyone can benefit and use the info this will create; everyone wants to know how their day will be - from planning an outing to deciding whether or not to bring a jumper to work.

### Privacy and security
I'm sourcing my data from different sources:
 - The Australian Buero of Meteorology's FTP site
   - FTP sites **ARE NOT** secure **AT ALL**, but it's publicly avaliable and no credentials will be used to login, so it is OK because if someone were to intercept the data requests they would only get the same data that is avaliable to everyone else and nothing private. And if someone really wanted to get your IP address, there are much easier, faster and cheaper ways to do so.
// TODO: Finish

Also, if I were to publish this app, I would have to take into account the fact that no user's data should ever leave their computer and also that this data should not be avaliable for people without access.

### Cyber security
// TODO: Finish

## Data dictionaries
### MAIN DICTIONARY
```json
{
   "Temps": TEMPS,
   "Rain": RAIN,
   "Names": NAMES,
}
```
### TEMPS
```json
{
    LocationCode: {
        "Date": [
            Date,
            Date,
            ...
        ],
        "MaxTemp": [
            Temp,
            Temp,
            ...
        ],
        "MinTemp": [
            Temp,
            Temp,
            ...
        ]
    }
}
```
| Field | Datatype | Format for Display | Description | Example | Validation |
|-------|----------|--------------------|-------------|---------|------------|
| Date | Timestamp | YYYY-MM-DD | The date of the temperature reading | 2021-01-01 | Must be a valid date |
| MaxTemp | Float64 | NNN.N | The maximum temperature for the day | 30.0 | Must be a valid number |
| MinTemp | Float64 | NNN.N | The minimum temperature for the day | 20.0 | Must be a valid number |

### RAIN
```json
[
    {
        LOCATIONCODE: {
            "Date": [
                Date,
                Date,
                ...
            ],
            "Rainfall": [
                Rainfall,
                Rainfall,
                ...
            ]
        }
    }, 
    {
        LOCATIONCODE: NAME,
        ...
    }
]
```
| Field | Datatype | Format for Display | Description | Example | Validation |
|-------|----------|--------------------|-------------|---------|------------|
| Date | Timestamp | YYYY-MM-DD | The date of the rainfall reading | 2021-01-01 | Must be a valid date |
| Rainfall | Float64 | NNN.N | The amount of rainfall for the day | 20.0 | Must be a valid number |
| LOCATIONCODE | Int64 | NNNNNN | The location code for the place in question | 4035 | Must be <= 6 characters, all numbers, not negative |
| NAME | Object | XXXXXXXX | The name of the place | ROEBOURNE | Can be anything, but seems to be all uppercase in the data |

### NAMES
| Field | Datatype | Format for Display | Description | Example | Validation |
|-------|----------|--------------------|-------------|---------|------------|
| Location | Int64 | NNNNNN | The location code for the place in question | 14723 | Must be <= 6 characters, all numbers, not negative |
| State | Object | XXX | The state that the location is in, or Unknown | NT | Must be 2 or 3 upper case letters OR "Unknown" |
| Name | Object | XXXXXXXX | The name of the place | BORROLOOLA AIRPORT | Can be anything, but seems to be all uppercase in the data |
| Lat | Float64 | NNN.NNNN | The latitude of the location | -16.0755 | Must be a valid number |
| Long | Float64 | NNN.NNNN | The longitude of the location | 136.3041 | Must be a valid number |


I am exploring the problem of predicting the weather with a neural network. I have found through research that this is a very hard problem [that has a lot of different aspects to it](https://media.bom.gov.au/social/blog/1696/explainer-how-meteorologists-forecast-the-weather/), and I probably have no chance at getting good results. *But*, I can try. So here's that.

Predicting the weather is a very hard task. It is done by many things; satellite measurements, radar, weather stations gathering information, and human experience. The weather is a chaotic system, and it is very hard to predict. But, if I can get enough data, it'll come at least a little close to correct :)

[According to sources](https://ncas.ac.uk/learn/what-causes-weather/), weather is caused by temperature, atmospheric pressure, cloud formation, wind, humidity and rain. So including all these in the dataset would be enough to get a good prediction I think.
### Existing code on how to do this
 - https://medium.com/@sebastienwebdev/forecasting-weather-patterns-with-lstm-a-python-guide-without-dates-433f0356136c
    - Could not find the dataset they mentioned to train it on to reproduce the results
 - https://www.kaggle.com/code/syedali110/weather-prediction-using-rnn
    - DOES ACTUALLY WORK, I TESTED IT
    - Uses a dataset from seattle
    - Uses an RNN

## Find Relevant Data: Search for datasets related to your issue. Ensure they align with cybersecurity and privacy principles.
### Datasets:
 - I could try scraping http://www.bom.gov.au/climate/change/hqsites/
 - [I FOUND ALL THE TEMPERATURE DATA I'LL EVER NEED](http://www.bom.gov.au/climate/data/acorn-sat/#tabs=Data-and-networks)
 - Lots of other data (Daily/Monthly rainfall, monthly pan evaporation & cloud data) found [here](http://www.bom.gov.au/climate/change/datasets/datasets.shtml) (Well, the links to the ftp server is there; and the ftp server still runs; so...)
### APIs:
 - https://open-meteo.com/en/docs/bom-api
 - https://openweathermap.org/city/2163137 (But I think it needs an API key... unless I can scrape it `(:<` )

## Planning: Create a data dictionary to show the type of data and parameters required for your dataset, then use this information to create a data-flow diagram.
Data needed:
 - temperature
 - atmospheric pressure
 - cloud formation
 - wind
 - humidity
 - rain
### Data flow diagram
```mermaid
flowchart LR
    one["Data in (see above)"]
    two["Data cleaning"]
    three["Neural network training"]
    four["Information out"]
    five["Data visualisation"]
    one --> two --> three --> four --> five
```