# Computer Infrastructure - Assessment

**Author: Rodrigo De Martino Ucedo**
****

This Jupyter Notebook was created as part of the Computer Infrastructure assessment for the Higher Diploma in Data Analytics course at ATU. It contains a brief report outlining how the tasks were completed. Additionally, the report provides short descriptions of the commands used in each task, along with explanations of their roles in achieving the task objectives.

### Task 1: Create Directory Structure

- **Task description:** Using the command line, create a directory (a folder) named `data` at the root of your repository. Inside `data`, create two subdirectories: `timestamps` and `weather`.

To create the main `data` directory, I used the `mkdir` command, which stands for "make directory." [1] The command to create the `data` directory was:

```
$ mkdir data
```

The `ls` command was used to verify that the `data` directory had been created in repository `computer_infrastructure`.

Inside the `data` directory, I needed two subdirectories: `timestamps` and `weather`. Instead of running two separate `mkdir` commands, I used the `-p` option with `mkdir` to create both subdirectories in one step, and then the subdirectories `timestamps` and `weather` are created inside it.

```
$ mkdir -p data/timestamps data/weather
```

This command creates both the `timestamps` and `weather` directories within the data directory. The `ls` command was used again to verify that the `timestamps` and `weather` subdirectory had been created in `data`.

### Task 2: Timestamps

- **Task description:** Navigate to the `data/timestamps` directory. Use the `date` command to output the current date and time, appending the output to a file named `now.txt`. Make sure to use the `>>` operator to append (not overwrite) the file. Repeat this step ten times, then use the `more` command to verify that `now.txt` has the expected content.

The first step was to change the current working directory to `data/timestamps`, where the now.txt file should be created. I used the `cd` (change directory) command:

```
$ cd data/timestamps
```

To append the current date and time to a file, I used the `date` command with custom format string to produce the output in `+"%Y%m%d%H%M%S"` format. [2] The `>>` operator was used to append this output to the file `now.txt`, rather than overwriting any existing content. [3]

```
$ date +"%Y%m%d%H%M%S" >> now.txt
```

I repeated the above `date +"%Y%m%d%H%M%S" >> now.txt` command 10 times, which appends the current date and time 10 times to the `now.txt` file: 

```
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
$ date +"%Y%m%d%H%M%S" >> now.txt
```

After appending the date and time 10 times, I used the `more` command to verify that the content of `now.txt` was as expected.[4] The more command displays the contents of a file page by page:

```
$ more now.txt
```

This allows me to check that the file contains the 10 appended entries with the current date and time.

### Task 3: Formatting Timestamps

- **Task description:** Run the `date` command again, but this time format the output using `YYYYmmdd_HHMMSS` (e.g., `20241126_130004` for 1:00:04 PM on November 26, 2024). Refer to the date man page (using `man date`) for more formatting options. (Press `q` to exit the man page). Append the formatted output to a file named `formatted.txt`.

To format the date in the desired `YYYYmmdd_HHMMSS` format, I used the `date` command with a custom format string:

- `%Y`: 4-digit year (e.g., 2024).
- `%m`: 2-digit month (e.g., 11 for November).
- `%d`: 2-digit day of the month (e.g., 26).
- `%H`: 2-digit hour in 24-hour format (e.g., 13 for 1 PM).
- `%M`: 2-digit minute (e.g., 20).
- `%S`: 2-digit second (e.g., 04).

This produces the current date and time in the format `YYYYmmdd_HHMMSS`. For example, it might output something like 20241026_132004 for 1:00:04 PM on November 26, 2024.

To store the formatted output in a file named `formatted.txt`, I used the append operator `>>`. This ensures that each new timestamp is added to the end of the file without overwriting the existing content.

```
$ date "+%Y%m%d_%H%M%S" >> formatted.txt
```

This command appends the formatted timestamp to the formatted.txt file. After running the command, I used the `more` command to verify that the formatted timestamp had been correctly appended to the `formatted.txt` file.

```
$ more formatted.txt
```

This command displays the contents of `formatted.txt`, allowing me to check that the file contains the expected formatted timestamps.

### Task 4: Create Timestamped Files

- **Task description:** Use the `touch` command to create an empty file with a name in the `YYYYmmdd_HHMMSS.txt` format. You can achieve this by embedding your date command in backticks `` ` `` into the touch command. You should no longer use redirection (`>>`) in this step.

The `touch` command is used to create empty files. [5] I used backticks (`` ` ``) to embed the `date` command within the `touch` command to create a file with a name that includes the current timestamp.

```
$ touch `date "+%Y%m%d_%H%M%S".txt`
```
After executing the `touch` command, I used the `ls` command to verify that the file was created with the correct timestamped name. [6] This command lists the files in the current directory, allowing me to confirm that a file with the timestamped name was successfully created.

### Task 5: Download Today's Weather Data

- **Task description:** Change to the `data/weather` directory. Download the latest weather data for the Athenry weather station from Met Eireann using `wget`. Use the `-O <filename>` option to save the file as `weather.json`. The data can be found at this URL: `https://prodapi.metweb.ie/observations/athenry/today`. [7]

The first step was to ensure I was in the correct directory where the weather data file should be saved. I used the `cd` command to navigate to the `data/weather` directory:

```
$ cd data/weather
```
This command changes the current directory to `data/weather`, ensuring that the downloaded weather data is saved in the correct location.

To download the weather data for the Athenry weather station from Met Eireann, I used the `wget` command. [8] The `wget` utility is a command-line tool for downloading files from the web. I used the `-O` option to specify the output filename as weather.json. The `-O` option allows me to save the file with a custom name, instead of using the default name. [9]

```
$ wget -O weather.json https://prodapi.metweb.ie/observations/athenry/today
```

After running the `wget` command, I used the `ls` command to check that the file `weather.json` was correctly downloaded and saved in the `data/weather` directory. This command lists the contents of the current directory, allowing me to confirm that the `weather.json` file exists.


### Task 6: Timestamp the Data

- **Task description:** Modify the command from Task 5 to save the downloaded file with a timestamped name in the format `YYYYmmdd_HHMMSS.json`.

To generate a timestamp for the filename, I used the `date` command with a custom format string. The goal was to format the current date and time in the `YYYYmmdd_HHMMSS` format, which will be used as part of the filename. The next step was to modify the `wget` command from Task 5 so that it saves the downloaded weather data with the timestamped filename:

```
$ wget -O `date "+%Y%m%d_%H%M%S".json"` https://prodapi.metweb.ie/observations/athenry/today
```
After running the modified `wget` command, I used the `ls` command to verify that the file was saved with the correct timestamped name in the `data/weather` directory.

### Task 7: Write the Script

- **Task description:** Write a bash script called `weather.sh` in the root of your repository. This script should automate the process from Task 6, saving the weather data to the `data/weather` directory. Make the script executable and test it by running it.

I used the `touch` command to create a new bash script file named `weather.sh` in the root directory of the repository. [10]

```
$ touch weather.sh
```

Inside the script, I included the following steps:
- Navigate to the data/weather directory.
- Use the `wget` command to download the weather data and save it with a timestamped filename.
- The script ensures that the weather data is saved as a JSON file with a filename in the `YYYYmmdd_HHMMSS.json` format.

```
#! /bin/bash

wget -O data/weather/`date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today
```

After writing the script, I saved and closed the file. In order for the script to be executable, I needed to change its permissions using the `chmod` command: [11]

```
chmod +x weather.sh
```

This command adds execute permissions to `weather.sh`, allowing it to be run as a script. After making the script executable, I ran the script to ensure it works correctly:

```
$ ./weather.sh
```

The script should automatically navigate to the data/weather directory, download the weather data, and save it with a timestamped filename in the format YYYYmmdd_HHMMSS.json. I used the `ls` command to verify that the file was saved with the correct timestamped name in the `data/weather` directory.


### Task 8: Notebook

- **Task description:** Create a notebook called `weather.ipynb` at the root of your repository. In this notebook, write a brief report explaining how you completed Tasks 1 to 7. Provide short descriptions of the commands used in each task and explain their role in completing the tasks.

I used the `touch` command to create the `weather.ipynb` notebook file at the root of the repository. The command I ran was:

```
$ touch weather.ipynb
```
This command created the weather.ipynb file without any content. To ensure that the file was created successfully, I listed the files in the current directory using the ls command.

***
## Task 9: Weather Analysis
***

**Task description:** In your `weather.ipynb` notebook, use the `pandas` function `read_json()` to load in any one of the weather data files you have downloaded with your script. Examine and summarize the data. Use the information provided [data.gov.ie](https://data.gov.ie/dataset/todays-weather-athenry) to write a short explanation of what the data set contains. [12]

First, I imported the necessary libraries to work with the weather data.

In [1]:
# Data frame.
import pandas as pd

#### Load data

I used the `pd.read_json()` function to load the weather data from a local JSON file. [13] This function reads the JSON file and converts it into a pandas DataFrame, which allows for easy manipulation and analysis of the data.

In [None]:
# Read the data.
df = pd.read_json('data/weather/20241114_172802.json')

#### Inspect Data

Once the data was loaded, I performed a quick inspection to understand its structure and contents. I used the following methods to summarize the data:

- `df.head()` to display the first few rows of the dataset.

- `df.tail()` to display the last few rows of the dataset.

- `df.info()` to get an overview of the column names, data types, and non-null values.

- `df.isnull().sum()` to display the count of missing values for each column in the dataset.

- `df.describe()` to get basic statistical summaries of numerical columns.

In [3]:
# The first 5 rows of the dataset.
df.head()

Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,9.0,15n,Fog / Mist,"""Fog thickening""",2,-,SW,225,100,0.1,1038.0,Thursday,2024-11-14,00:00
1,Athenry,9.0,15n,Fog / Mist,"""Fog thickening""",2,-,W,270,100,0.0,1038.0,Thursday,2024-11-14,01:00
2,Athenry,10.0,15n,Fog / Mist,"""Fog thinning""",-,-,,0,99,0.0,1037.0,Thursday,2024-11-14,02:00
3,Athenry,10.0,09n,Rain,"""Moderate rain """,4,-,N,0,100,0.01,1037.0,Thursday,2024-11-14,03:00
4,Athenry,10.0,09n,Rain,"""Moderate rain """,4,-,N,0,99,0.01,1037.0,Thursday,2024-11-14,04:00


In [4]:
# The last 5 rows of the dataset.
df.tail()

Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
13,Athenry,,,,,,-,-99,0,-99,,,Thursday,2024-11-14,13:00
14,Athenry,,,,,,-,-99,0,-99,,,Thursday,2024-11-14,14:00
15,Athenry,,,,,,-,-99,0,-99,,,Thursday,2024-11-14,15:00
16,Athenry,,,,,,-,-99,0,-99,,,Thursday,2024-11-14,16:00
17,Athenry,,,,,,-,-99,0,-99,,,Thursday,2024-11-14,17:00


The dataset includes several weather variables: temperature, wind direction, humidity, rainfall, pressure, and date.

In [5]:
# Informations of the dataset.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   name                   18 non-null     object        
 1   temperature            11 non-null     float64       
 2   symbol                 18 non-null     object        
 3   weatherDescription     18 non-null     object        
 4   text                   18 non-null     object        
 5   windSpeed              18 non-null     object        
 6   windGust               18 non-null     object        
 7   cardinalWindDirection  18 non-null     object        
 8   windDirection          18 non-null     int64         
 9   humidity               18 non-null     int64         
 10  rainfall               10 non-null     float64       
 11  pressure               11 non-null     float64       
 12  dayName                18 non-null     object        
 13  date   

The dataset consists of 18 rows and 15 columns, with each row representing a weather observation for a specific day. The columns represent different weather parameters and data about each observation.

In [6]:
# Count the number of null.
df.isnull().sum()

name                     0
temperature              7
symbol                   0
weatherDescription       0
text                     0
windSpeed                0
windGust                 0
cardinalWindDirection    0
windDirection            0
humidity                 0
rainfall                 8
pressure                 7
dayName                  0
date                     0
reportTime               0
dtype: int64

From the output, the following columns have missing values:

- Temperature: 7 missing entries.

- Rainfall: 8 missing entries.

- Pressure: 7 missing entries.

Other columns such as name, symbol, weatherDescription, windSpeed, windDirection, and humidity have no missing values, indicating that these measurements are consistently recorded.

In [7]:
# Descibe the data set.
df.describe()

Unnamed: 0,temperature,windDirection,humidity,rainfall,pressure,date
count,11.0,18.0,18.0,10.0,11.0,18
mean,10.545455,45.0,11.333333,0.013,1036.818182,2024-11-14 00:00:00
min,9.0,0.0,-99.0,0.0,1036.0,2024-11-14 00:00:00
25%,10.0,0.0,-99.0,0.0,1036.0,2024-11-14 00:00:00
50%,11.0,0.0,99.0,0.0,1037.0,2024-11-14 00:00:00
75%,11.0,0.0,100.0,0.01,1037.0,2024-11-14 00:00:00
max,12.0,315.0,100.0,0.1,1038.0,2024-11-14 00:00:00
std,1.035725,104.6844,101.54686,0.03093,0.750757,


Based on the summary statistics:

- The temperature values range from 9°C to 12°C, with an average of 10.55°C.

- Humidity values are quite diverse, with some missing data (indicated by -99).

- Wind direction varies greatly, with some values likely representing missing or anomalous data.

- Rainfall is very low on average, with a maximum value of 0.1 mm, indicating relatively dry conditions for the period.

- Pressure values are consistent, ranging from 1036 to 1038 hPa, with a mean of 1036.82 hPa.

**Conclusion:** Based on the metadata available on data.gov.ie, I could confirm that the dataset contains daily weather observations, including measurements like temperature, rainfall, and wind speed. This data is typically used for monitoring weather patterns, climate analysis, and forecasting. After loading the weather data and summarizing it, I was able to confirm that it provides daily weather observations that can be used for various analytical tasks. The dataset contains numerical and date-based columns that facilitate time-series analysis, trend detection, and comparisons of weather patterns across different dates.

******
### References
*****

[1] Create a directory: https://www.w3schools.com/python/ref_os_mkdir.asp

[2] Write current date and time to a file: https://stackoverflow.com/questions/54902053/how-to-get-the-current-date-and-time-in-specific-format-in-shell

[3] Apppend the output to a file: https://stackoverflow.com/questions/5342832/how-to-append-the-output-to-a-file

[4] View text files in the command line interface: https://www.geeksforgeeks.org/more-command-in-linux-with-examples/

[5] Create a file on specified directory: https://www.geeksforgeeks.org/touch-module-in-python/

[6] Display the list of files and directories: https://www.freecodecamp.org/news/python-list-files-in-a-directory-guide-listdir-vs-system-ls-explained-with-examples/

[7] Weather data for the Athenry weather station: https://prodapi.metweb.ie/observations/athenry/today

[8] Download a file from a server: https://www.scrapingbee.com/blog/python-wget/

[9] -O to save the file with a custom name: https://stackoverflow.com/questions/9830242/what-does-wget-o-mean

[10] Create a bash script: https://python.land/the-unix-shell/creating-bash-scripts

[11] Change the permissions of files and directories: https://www.warp.dev/terminus/chmod-x

[12] Today's weather Athenry, data.gov.ie: https://data.gov.ie/dataset/todays-weather-athenry

[13] Read JSON - Pandas: https://www.w3schools.com/python/pandas/pandas_json.asp

******
# End