# Phase 2: Researching and Planning

## Research Your Chosen Issue: Dive deep into your chosen data scenario. What's the problem or topic you're exploring? Document your findings in Jupyter notebook.
The purpose of this task is to find out what the weather will be for a particular place in Australia. This is important as it allows people to plan their day accordingly; bring a jumper if it's going to be cold, an umbrella if it's going to rain, etc. I feel it's necessary to carry out this analysis as it can provide another viewpoint to what the weather may be; so people can make a more informed guess by looking at this different algorithm which may be more or less accurate than the real weather prediction but then be able to compare it with the current predictions for maximum accuracy. Everyone can benefit and use the info this will create; everyone wants to know how their day will be - from planning an outing to deciding whether or not to bring a jumper to work. But, I have found through research that this is a very hard problem [that has a lot of different aspects to it](https://media.bom.gov.au/social/blog/1696/explainer-how-meteorologists-forecast-the-weather/), but this project will aim to at least give it my best shot and provide another viewpoint to what the weather may be like. 

### Privacy and security
I'm sourcing my data from different sources:
 - The Australian Buero of Meteorology's FTP site
   - There is lots of data the BOM tries to protect. It does not do a good job at it. But, their FTP site is open to all and they don't care about people using the data there.
   - FTP sites **ARE NOT** secure **AT ALL**, but it's publicly avaliable and no credentials will be used to login, so it is OK because if someone were to intercept the data requests they would only get the same data that is avaliable to everyone else and nothing private.
   - Even so, the fact that they still use an outdated method of file transfer is not very good - they aren't adhering to their responsibilities to protect the user's privacy. But, with something that's open to the internet, there's nothing really to hide there.
 - [An open-source Buero of Meteorology API](https://open-meteo.com/en/docs/bom-api) to get the current and past couple of days of weather
   - This is secure, with it sending requests which are proven to be a very secure method of interacting with the internet.
   - The data the source needs to protect is their weather data and they need to protect the user's privacy by not sharing it to any external parties

Also, if I were to publish this app, I would have to take into account the fact that no user's data should ever leave their computer and also that this data should not be avaliable for people without access. The only data that's identifiable would be the user's location they chose to train an AI with.

If I were to push the application to the general public, I would need to ensure the application is *scalable* - which means that many people can use it at the same time. I would need to encrypt data such as people's locations (of where they want to train the AI) so no one who intercepts can read those - but anything not related to someone personally (e.g. the website HTML itself) doesn't need encryption.

#### Cyber security
Applications should have many different layers of protection in order to maintain cybersecurity. Here are some of them:
 - User authentification: This means that the user needs to be logged in to ensure that any suspicious activity can be traced and stopped, and also to ensure that the only people who can access the data are those who should.
 - Password hashing: This is the process of taking passwords and making them unreadable to anyone who may intercept the request along the way; so that the user's computer knows the real password and the server knows the real password but when communicating they encrypt it so no one along the way can know the password.
 - Encryption: This is like "locking" your data with a key. It applies a function (there are many good ones out there) to take a password and lock it with a key so you can only find the real data if you have the key. 

## Data dictionaries
### MAIN DICTIONARY
```json
{
   "Temps": TEMPS,
   "Rain": RAIN,
   "Names": NAMES,
}
```
#### TEMPS
```json
{
    LocationCode: {
        "Date": [
            Date,
            Date,
            ...
        ],
        "MaxTemp": [
            Temp,
            Temp,
            ...
        ],
        "MinTemp": [
            Temp,
            Temp,
            ...
        ]
    }
}
```
| Field | Datatype | Format for Display | Description | Example | Validation |
|-------|----------|--------------------|-------------|---------|------------|
| Date | Timestamp | YYYY-MM-DD | The date of the temperature reading | 2021-01-01 | Must be a valid date |
| MaxTemp | Float64 | NNN.N | The maximum temperature for the day | 30.0 | Must be a valid number |
| MinTemp | Float64 | NNN.N | The minimum temperature for the day | 20.0 | Must be a valid number |

#### RAIN
```json
[
    {
        LOCATIONCODE: {
            "Date": [
                Date,
                Date,
                ...
            ],
            "Rainfall": [
                Rainfall,
                Rainfall,
                ...
            ]
        }
    }, 
    {
        LOCATIONCODE: NAME,
        ...
    }
]
```
| Field | Datatype | Format for Display | Description | Example | Validation |
|-------|----------|--------------------|-------------|---------|------------|
| Date | Timestamp | YYYY-MM-DD | The date of the rainfall reading | 2021-01-01 | Must be a valid date |
| Rainfall | Float64 | NNN.N | The amount of rainfall for the day | 20.0 | Must be a valid number |
| LOCATIONCODE | Int64 | NNNNNN | The location code for the place in question | 4035 | Must be <= 6 characters, all numbers, not negative |
| NAME | Object | XXXXXXXX | The name of the place | ROEBOURNE | Can be anything, but seems to be all uppercase in the data |

### NAMES
| Field | Datatype | Format for Display | Description | Example | Validation |
|-------|----------|--------------------|-------------|---------|------------|
| Location | Int64 | NNNNNN | The location code for the place in question | 14723 | Must be <= 6 characters, all numbers, not negative |
| State | Object | XXX | The state that the location is in, or Unknown | NT | Must be 2 or 3 upper case letters OR "Unknown" |
| Name | Object | XXXXXXXX | The name of the place | BORROLOOLA AIRPORT | Can be anything, but seems to be all uppercase in the data |
| Lat | Float64 | NNN.NNNN | The latitude of the location | -16.0755 | Must be a valid number |
| Long | Float64 | NNN.NNNN | The longitude of the location | 136.3041 | Must be a valid number |

### API DICTIONARY
| Field | Datatype | Format for Display | Description | Example | Validation |
|-------|----------|--------------------|-------------|---------|------------|
| date | DateTime | YYYY-MM-DD HH:MM:SS | The date this specific weather measurement/prediction is from | 2024-05-19 14:00:00 | It is a valid date and time |
| temperature_2m | Float64 | XX.XXXX | The weather measurement/prediction recorded | 7.437500 | It is a valid float |

### Data flow diagram
```mermaid
flowchart LR
    one["Data in (see above)"]
    two["Data cleaning"]
    three1["Data visualisation"]
    three2["Neural network training"]
    four["Neural network data visualisation"]
    one --> two --> three2 --> four
    two --> three1
```

## Other data I could use later
[Maybe useful link](http://www.bom.gov.au/climate/change/datasets/datasets.shtml)
 - atmospheric pressure
 - cloud formation
 - wind
 - humidity
