# Task 8: weather.ipynb


>Create a notebook called weather.ipynb at the root of your repository. In this notebook, write a brief report explaining how you completed Tasks 1 to 7. Provide short descriptions of the commands used in each task and explain their role in completing the tasks.

# Task 1. Create directory structure.

>Using the command line, create a directory (that is, a folder) named data at the root of your repository. Inside data, create two subdirectories: timestamps and weather.

In order to create a directory structure using the command line, the `mkdir` command is used. The `mkdir` command is used to make a new directory in the current location. For example if we are in the root of our system and type the `mkdir data` command, a new directory called data will be created at the root of our system. However, if we wish to create subdirectories we could do this using various different methods. 

![data](./images/data.png)

Probably the simplest method would be to use the command line to change directory to the data directory by typing `cd data` command. The `cd` command is used to 'change directory', and will bring us into the data directory by typing `cd data` as previously mentioned. We then use the `mkdir` command inside this direcory to create the timestamps and weather directories one at a time.

![weather/timestamps](./images/weather.png)

This is not the only way we could create directories and subdirectories. The `mkdir` command comes with various options or flags as they are sometimes called. The `-p` flag for example will create the parent directory, if it does not already exist followed by the child directory as specified while using the command. For example if we typed `mkdir -p data/weather` while in the root of our system, the parent directory data would be created in the root of our system, and the child direcory weather would be created inside this folder. We could then reuse this command again in the root of our system to create the timestamps folder by typing `mkdir -p data/timestamps`.

We could also just create all the folders at one time. To do this we still use the `-p` flag, but this time we list all the sub-directories after, therefore to create sub-directories weather and timestamps, we use the `mkdir -p data/weather data/timestamps` command. 

![mkdir](./images/mkdir.png)

Alternatively, we could list the subdirectories inside curly braces, with no spaces between the directory names. Therefore, for our purposes, we can type `mkdir -p data/{weather,timestamps}` to create the directory structure we require. 

![curly](./images/mkdir2.png)

# Task 2. Timestamps

>Navigate to the data/timestamps directory. Use the date command to output the current date and time, appending the output to a file named now.txt. Make sure to use the >> operator to append (not overwrite) the file. Repeat this step ten times, then use the more command to verify that now.txt has the expected content.

The date command will display the day, date, time and timezone when it is run as shown below.

![date command](./images/date.png)

The output of a command can be appended to a file with the redirect operator, `>>`. If the file does not exist a new file will be created. To append the date and time to a file the syntax is: `date >> <filename>` where \<filename\> is the name of the file to create/append too, as shown in the image below. It should be noted that there are other redirect operators also. For example there is a 

![now.txt](./images/now.png)

The date command was run 10 times as directed and the output was appended to now.txt. We can view the contents of this file by using the `more` command. The results of this are shown below.

![more](./images/more.png)







# Task 3: Formatting Timestamps

> Run the date command again, but this time format the output using YYYYmmdd_HHMMSS (e.g., 20261114_130003 for 1:00:03 PM on November 14, 2026). Refer to the date man page (using man date) for more formatting options. (Press q to exit the man page). Append the formatted output to a file named formatted.txt.

Some of the most popular formatting options are given in the table below. For example to format the weekday as the abbreviated weekday name (Mon, Tue, Wed etc) the format sequence %a is used. To format the weekday as locale's full name (Monday, Tuesday, Wednesay etc) the format sequence %A is used. To format the weekday as a number %w is used, where 0 is Sunday and 6 is Saturday.   To see all the options refer to the manual pages for the date command, by typing `man date` in the command line as shown below also.


![man date](./images/man_date.png)

|Directive | Meaning |
| --- | --- |
| `%a` | Weekday as locale's abbreviated name |
| `%A` | Weekday as locale's full name |
| `%w` | Weekday as decimal number, where 0 is Sunday and 6 is Saturday |
| `%d` | Day of the month as a zero-padded decimal number [01, 02, ..., 30, 31] |
| `%b` | Month as locale's abbreviated name |
| `%B` | Weekday as locale's full name |
| `%m` | Month as zero-padded decimal number [01, 02, ..., 11, 12] |
| `%y` | Year without century as a zero-padded decimal number name [0001, ..., 2019, 2020, ..., 9999] |
| `%Y` | Year with century as a decimal number  |
| `%H` | Hour (24-hour clock) as a zero-padded decimal number [00, 01, ..., 22, 23] |
| `%I` | Hour (12-hour clock) as a zero-padded decimal number |
| `%p` | Locale equivalent of either AM or PM |
| `%M` | Minute as a zero-padded decimal number [00, 01, ..., 58, 59]|
| `%S` | Second as a zero-padded decimal number [00, 01, ..., 58, 59]|
| `%f` | Microsecond as a zero-padded decimal number |
| `%j` | Day of the year as a zero-padded decimal number |
| `%W` | Week number of the year (Monday as the first day of the week) as a decimal number |
| `%U` | Week number of the year (Sunday as the first day of the week) as a decimal number |
| `%c` | Locale’s appropriate date and time representation |
| `%Z` | Time zone name |
| `%z` | UTC offset in the form HH[SS[.fffff]] |


For this task the date should be formatted as YYYYmmdd_HHMMSS and the format for this using the table above is %Y%m%d_%H%M%S. This is completed step by step in the terminal window below. Firstly we change directory to the timestamps folder. Next redirect the output of the `date +"%Y%m%d_%H%M%S"` using the redirect `>>` command, to the formatted.txt file (if it exists) using the `date +"%Y%m%d_%H%M%S" >> formatted.txt` command. We can then view this using the `more` command by using the `more formatted.txt`

![formatted.txt](./images/formatted.png)

# Task 4: Create Timestamped Files

>Use the touch command to create an empty file with a name in the YYYYmmdd_HHMMSS.txt format. You can achieve this by embedding your date command in backticks ` into the touch command. You should no longer use redirection (>>) in this step.



By putting the command in backticks we can assign it to a variable or pass it to another command. This is known as command substitution. Here we are putting the date command in backticks so that it may be passed into the touch command. There are two methods for command substitution in bash:

1. $( )
1. Using backticks ``

Consider the following example which can be found ![here](https://sysxplore.com/bash-command-substitution)

```bash
today=$(date +%F)

echo "Today's date is $today"
```
This code executes the date command, captures its output, and assigns it to the $today variable. The $today variable is then displayed using the echo command.
Without command substitution, the code would look like this:

```bash
today=date +%F
echo "Today's date is $today"
```

This would print the literal string "date +%F" instead of executing the date command and capturing its output. When the name of a file is formatted as `YYYYmmdd_HHMMSS` the files will be listed numerically in the directory, with the most recent file first.

For this task, let us create a new file using the `touch` command. By embedding the formatted `date` command in backticks we can pass it to the `touch` command and this will give the file the desired name as a timestamp. 

![touch](./images/touch.png)

Okay, so let us take a look at the file that has being created using the touch command. To do this we use the `ls -alrt` command in the timestamp directory. This shows us that the file was created on 19 October 2024 at 19:15. We can confirm that the file has being saved as a timestamp by looking at the name of the file 20241019_191526.txt. We can also see that the file is empty and has a size of zero bytes.  

![timestamp](./images/timestamp.png)

# Task 5: Download Today's Weather Data

>Change to the data/weather directory. Download the latest weather data for the Athenry weather station from Met Eireann using wget. Use the -O <filename> option to save the file as weather.json. The data can be found at this [URL:](https://prodapi.metweb.ie/observations/athenry/today)

The wget command is used to download files from the internet via HTTP, HTTPS and FTP protocols. Wget is a non-interactive downloader which can download files from the server even when the user is not logged onto the system. It can also work in the background without hindering the current process.

The basic syntax of the wget command is as follows:

```bash
wget [option] [URL]
```
where 
1. [option] represents various command-line options that modify the behaviour of wget
1. [URL] is the address of the file or website to be downloaded.

Some options that can be useful when using the wget command and there functions are given in the table below. This information has being taken from the [geeks for geeks website](https://www.geeksforgeeks.org/wget-command-in-linux-unix/)


| Option          | Description                                                                                                                | Syntax                     |
|-----------------|----------------------------------------------------------------------------------------------------------------------------|----------------------------|
| -v/ -version    | Displays the version of wget installed on your system                                                                      | $ wget -v                  |
| -c              | Resume a partially downloaded file if the file supports  resuming. If resuming is not supported the file cannot be resumed | $ wget -c [URL]            |
| -O <filename>   | Allows the output from the wget command to be redirected  to a file.                                                       | $ wget -O <filename> [URL] |
| -b/ -background | Downloads a file in the background                                                                                         | $ wget -b [URL]            |
| -h/ -help       | Print a help message displaying all available command line options for wget                                                | $ wget -h                  |


We are using the -O option mentioned above to redirect the weather data to a particular file, which we have called weather.json.

![weather](./images/weatherjson.png)

# Task 6: Task 6: Timestamp the Data

>Modify the command from Task 5 to save the downloaded file with a timestamped name in the format YYYYmmdd_HHMMSS.json.

As in task 4 above, in order to save the file with a timestamped name, the `date` command is embedded in backticks. The only differece is that above we were saving the file as a txt file and here we will save it as a json file. 

![timestamped.json](./images/timestampedweather.png)

# Task 9: Pandas

>In your weather.ipynb notebook, use the pandas function read_json() to load in any one of the weather data files you have downloaded with your script. Examine and summarize the data. Use the information provided data.gov.ie to write a short explanation of what the data set contains

In [6]:
#imimport pandas as pd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as snsport


# Collecting the data

The following code outputs the filename used to store the data file.
```bash
date +"%Y%m%d_%H%M%S_athenry.json
```
The `%Y` is replaced by the four-digit year (e.g. 2024).

In [7]:
# Read the data.
df = pd.read_json('data/weather/20241112_214914_athenry.json')

df.head()


Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,2,15n,Fog / Mist,"""Recent Fog""",4,-,NE,45,98,0.0,1041,Tuesday,2024-12-11,00:00
1,Athenry,1,15n,Fog / Mist,"""Recent Fog""",2,-,NE,45,96,0.0,1041,Tuesday,2024-12-11,01:00
2,Athenry,1,15n,Fog / Mist,"""Mist""",-,-,,0,99,0.0,1041,Tuesday,2024-12-11,02:00
3,Athenry,2,15n,Fog / Mist,"""Fog""",2,-,E,90,99,0.0,1041,Tuesday,2024-12-11,03:00
4,Athenry,3,15n,Fog / Mist,"""Fog""",7,-,E,90,98,0.0,1041,Tuesday,2024-12-11,04:00


In [8]:
df.describe()

Unnamed: 0,temperature,windDirection,humidity,rainfall,pressure,date
count,22.0,22.0,22.0,22.0,22.0,22
mean,4.5,79.772727,98.727273,0.005,1041.136364,2024-12-11 00:00:00
min,1.0,0.0,96.0,0.0,1040.0,2024-12-11 00:00:00
25%,3.0,45.0,99.0,0.0,1041.0,2024-12-11 00:00:00
50%,4.5,90.0,99.0,0.0,1041.0,2024-12-11 00:00:00
75%,6.0,90.0,99.0,0.0,1041.0,2024-12-11 00:00:00
max,7.0,135.0,99.0,0.1,1042.0,2024-12-11 00:00:00
std,2.017778,39.111479,0.7025,0.021325,0.467563,


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   name                   22 non-null     object        
 1   temperature            22 non-null     int64         
 2   symbol                 22 non-null     object        
 3   weatherDescription     22 non-null     object        
 4   text                   22 non-null     object        
 5   windSpeed              22 non-null     object        
 6   windGust               22 non-null     object        
 7   cardinalWindDirection  22 non-null     object        
 8   windDirection          22 non-null     int64         
 9   humidity               22 non-null     int64         
 10  rainfall               22 non-null     float64       
 11  pressure               22 non-null     int64         
 12  dayName                22 non-null     object        
 13  date   

The data.info() output reveals several important aspects of the dataset. Firstly we can see that there are 22 entries in total. The dataset contains 15 columns with different data types. Nine columns are of the type object (e.g., name, symbol, weatherDescription), four columns are of the type int64 (e.g., temperature, windDirection), one column is of the type float64 (rainfall), one column is of type datetime64[ns] (date).

In [10]:
df.isnull().sum()

name                     0
temperature              0
symbol                   0
weatherDescription       0
text                     0
windSpeed                0
windGust                 0
cardinalWindDirection    0
windDirection            0
humidity                 0
rainfall                 0
pressure                 0
dayName                  0
date                     0
reportTime               0
dtype: int64

### End