# Weather
---
In this notebook I will explain the steps taken to execute the requested tasks for this module, 1 to 7. 

For task 8, I will analyse the data of one of the downloaded json files in one of the tasks.

### Walkthrough of tasks
-----
To complete these tasks I can use many different [command line applications](https://www.linfo.org/command_line_lesson_1.html), like [Cmder](https://github.com/cmderdev/cmder/wiki), [Homebrew](https://en.wikipedia.org/wiki/Homebrew_(package_manager)), [ohmyzsh](https://github.com/ohmyzsh/ohmyzsh/wiki/FAQ#what-is-oh-my-zsh-and-what-does-it-have-to-do-with-zsh), depending on their availability on the different [software systems](https://en.wikipedia.org/wiki/Operating_system), per example macOS, Linux and Microsoft Windows. 

I can also do these tasks virtually in the cloud, through [GitHub codespaces](https://docs.github.com/en/codespaces/overview#what-is-a-codespace), available on any machine. This cloud environment follows the [Linux shell bash commands](https://www.linfo.org/command.html). In this walkthrough, I will use GitHub Codespaces to perform these tasks, based on the lectures given in this module.

As Linux explains, a [command line](https://www.linfo.org/command_line.html) is "...an instruction given by a human to tell a computer to do something."

Some of the codes used can differ, depending on the command line used, but many of them are the same because they come from a common origin, built initially with [Unix shell](https://en.wikipedia.org/wiki/Unix_shell#Early_shells) in the late 1960s and early 1970s. After that many [other developers](https://en.wikipedia.org/wiki/Command-line_interface#History), like the [MS-DOS](https://devblogs.microsoft.com/commandline/windows-command-line-the-evolution-of-the-windows-command-line/#microsoft-%E2%80%93-unix-market-leader-yes-seriously), are based on them, expanding the functionality of shells with new commands and modifying some features.

In the following tasks, I will use some commands that were explained during the lectures and these can be found openly in the [Linux Handbook](https://linuxhandbook.com/a-to-z-linux-commands/).

https://www.bell-labs.com/usr/dmr/www/hist.html

https://hitm.fandom.com/wiki/Unix

**Task 1: Create Directory Structure**

Using the command line, create a directory (that is, a folder) named data at the root of your repository. Inside data, create two subdirectories: timestamps and weather.

Steps:

- In my GitHub account, I will go to the repository "computer-infrastructure-assessment", and through [Codespaces](https://docs.github.com/en/codespaces/overview) I will open a virtual cloud, with a command line using bash;
- in the command line and I check if I'm in the main directory, **/workspaces/computer-infrastructure-assessment**;
- then I use the command `mkdir` and I write the name of the folder that I want, `data`;
```bash
mkdir data
```
> mkdir: make directory.

- then to create two subdirectories, I can move into the new folder, using the command `cd data/`;
> cd: change directory.

- after that I can create the subdirectories, using again the command `mkdir`, one with `mkdir timestamps` and the other with `mkdir weather`;
- also, I can do the same without moving into data directory, for example, with `mkdir data/weather`.

**Task 2: Timestamps**

Navigate to the data/timestamps directory. Use the date command to output the current date and time, appending the output to a file named now.txt. Make sure to use the >> operator to append (not overwrite) the file. Repeat this step ten times, then use the more command to verify that now.txt has the expected content.

Steps: 
- I will check if I'm in the directory timestamps, */workspaces/computer-infrastructure-assessment/data/timestamps*. If not I will use `cd name_of_the_directory_that_I_want_to_go_in/` or `cd ..` to go back into the directory, until I reach the timestamps;
> .. : parent directory, that's one level above.

- to create the file, I can use:
```bash
touch now.txt
```
> touch: create a new file or change file timestamps.

- to write the current date in the txt file, I use:
```bash
date >> now.txt
```
> date: print current date and time.

>_>>_: append to file.

- I repeat the process until I have ten entries;
- finally, to check the contents, I used:
```bash
cat now.txt 
```
> cat: catenate, shows the contents of files.

**Task 3: Formatting Timestamps**

Run the date command again, but this time format the output using YYYYmmdd_HHMMSS (e.g., 20261114_130003 for 1:00:03 PM on November 14, 2026). Refer to the date man page (using man date) for more formatting options. (Press q to exit the man page). Append the formatted output to a file named formatted.txt.

Steps:

- in order to get the output equal to YYYYmmdd_HHMMSS, I need to write `date +"%Y%m%d_%H%M%S"`, as explained in the date manual in `man date`;
> man: manual of command.

- now to append the output to the new file, I write:
```bash
date +"%Y%m%d_%H%M%S" >> formatted.txt
```
> date +:"..." display date and/or time in the format defined by the user.

> %Y: print year.

> %m: print month.

> %d: print day.

> %H: print hour.

> %M: print minutes.

> %S: print seconds.

**Task 4: Create Timestamped Files**

Use the touch command to create an empty file with a name in the YYYYmmdd_HHMMSS.txt format. You can achieve this by embedding your date command in backticks ` into the touch command. You should no longer use redirection (>>) in this step.

Step:
- To create an empty txt file with the name of the current date and time, I will use the command:
```bash
touch > `date +"%Y%m%d_%H%M%S.txt"`
```
>_>_: overwrite file.

>`date +"...txt"': file to be named with current date/time format defined, and the type of file chosen by the user, in this case, it's a txt.

**Task 5: Download Today's Weather Data**

Change to the data/weather directory. Download the latest weather data for the Athenry weather station from Met Eireann using wget. Use the -O <filename> option to save the file as weather.json. The data can be found at this URL:
https://prodapi.metweb.ie/observations/athenry/today.

Steps:

- To move from timestamps directory into weather, first I will go back with `cd ..` to be in the data directory, and then I write `cd weather/`;
- In order to create the json file, I need to write the following command:
```bash
wget -O weather.json https://prodapi.metweb.ie/observations/athenry/today
```
> wget: download files over the internet. The -O will specify the filename for the downloaded file.

**Task 6: Timestamp the Data**

Modify the command from Task 5 to save the downloaded file with a timestamped name in the format YYYYmmdd_HHMMSS.json.

Step:

- To save the file with a timestamp name, I can use the code: 
```bash
wget -O `date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today
```

**Task 7: Write the Script**

Write a bash script called weather.sh in the root of your repository. This script should automate the process from Task 6, saving the weather data to the data/weather directory. Make the script executable and test it by running it.

Steps:

- I need to make sure I'm in the root of my repository, **/workspaces/computer-infrastructure-assessment**;
- Then in the Explorer tab, I create a new file in codespaces, with the name `weather.sh`;
- in this file, I write the following lines:

~~~sh
#! /bin/bash

wget -O data/weather/`date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today
~~~
>_#_! /bin/bash: the system will use bash as an interpreter for this script.

- to add permission to run the script weather.sh, I need to use the command line:
```bash
chmod u+x ./weather.sh
```
> chmod: change mode, and u+x is  to change file access to executable.

- finally, to run the script, I write:
```bash
./ weather.sh
```
>_._ : current directory.

### Analysing data
---

In [9]:
# Library.
import pandas as pd

# File to use.
weatherfile="./weather/20241101_164019.json"

In [10]:
# load file.
df=pd.read_json(weatherfile)

# Check.
df.head(3)

Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,11,04n,Cloudy,"""Cloudy""",11,-,SW,225,89,0,1024,Friday,2024-01-11,00:00
1,Athenry,11,04n,Cloudy,"""Cloudy""",9,-,SW,225,89,0,1024,Friday,2024-01-11,01:00
2,Athenry,11,04n,Cloudy,"""Cloudy""",9,-,SW,225,89,0,1024,Friday,2024-01-11,02:00


In this Data Frame, there's a lot of weather information: name location, temperature, symbol, weather description and text, wind speed, wind gust, cardinal wind direction, wind direction, humidity, rainfall, pressure, day name, date and report time.

In [11]:
# Inspect the types.
df.dtypes

name                             object
temperature                       int64
symbol                           object
weatherDescription               object
text                             object
windSpeed                         int64
windGust                         object
cardinalWindDirection            object
windDirection                     int64
humidity                          int64
rainfall                          int64
pressure                          int64
dayName                          object
date                     datetime64[ns]
reportTime                       object
dtype: object

The types of variables present in this data are: object, int, and datetime.
It's possible to make a statistical description of variables when these are ints or floats, as demonstrated below.

In [12]:
# Describe the data.
df.describe()

Unnamed: 0,temperature,windSpeed,windDirection,humidity,rainfall,pressure,date
count,17.0,17.0,17.0,17.0,17.0,17.0,17
mean,11.764706,6.941176,219.705882,89.058824,0.0,1024.823529,2024-01-11 00:00:00
min,11.0,4.0,180.0,85.0,0.0,1024.0,2024-01-11 00:00:00
25%,11.0,6.0,225.0,88.0,0.0,1024.0,2024-01-11 00:00:00
50%,11.0,6.0,225.0,89.0,0.0,1024.0,2024-01-11 00:00:00
75%,13.0,7.0,225.0,91.0,0.0,1026.0,2024-01-11 00:00:00
max,13.0,11.0,270.0,92.0,0.0,1026.0,2024-01-11 00:00:00
std,0.903425,1.675955,21.828206,2.276801,0.0,0.951006,


When doing the describe function in the Data Frame, Pandas generates some statistics, like the count, mean, min, max, standard deviation and the percentile 25%, 50% and 75% of the values given. [W3schools](https://www.w3schools.com/python/python_ml_percentile.asp) explains that "Percentiles are used in statistics to give you a number that describes the value that a given percent of the values are lower than."
The only variable that we can ignore it's the date because it doesn't produce any relevant information for this type of statistic.


https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html
https://www.w3schools.com/python/pandas/ref_df_describe.asp#:~:text=The%20describe()%20method%20returns,The%20average%20(mean)%20value.

_______
## End