# Report: on the tasks performed during computer infrastructure assessment 
# Task 8: Notebook
## Author : Andre Hoarau


# Task 1: Create Directory Structure
<ul>
<li>To perform this task I first had to ensure that I was at the root of my directory. My directory was named computer_infrastructure_assessment.</p>
I used: 

```bash 
$ pwd
/workspaces/computer_infrastructure_assessment/
```

<li>pwd = present working directory, this command prints the full path of the current directory in which my terminal was. Whilst I was in the root of my directory if I needed to go up a directory:</p>

```bash
$ cd ..
/workspaces
```
<li>This used the change directory command (cd) followed by the parent directory command (..). 
I can also use:

```bash
$ ls
20241104_091043_athenry.json  20241104_091426_athenry.json  20241104_091433_athenry.json  data      weather.ipynb
20241104_091145_athenry.json  20241104_091430_athenry.json  README.md                     edits.md  weather.sh
```

<li>The ls command (list files) is used to see all the contents within the current directory that I was in. 

I could also use:

```bash
$ ls-all
total 64
drwxrwxrwx+ 4 codespace root       4096 Nov 10 09:00 .
drwxr-xrwx+ 5 codespace root       4096 Oct 19 13:30 ..
drwxrwxrwx+ 9 codespace root       4096 Nov 15 18:52 .git
-rw-rw-rw-  1 codespace root       3868 Oct 19 13:30 .gitignore
-rw-rw-rw-  1 codespace codespace  2710 Nov  4 09:10 20241104_091043_athenry.json
-rw-rw-rw-  1 codespace codespace  2710 Nov  4 09:11 20241104_091145_athenry.json
-rw-rw-rw-  1 codespace codespace  2710 Nov  4 09:14 20241104_091426_athenry.json
-rw-rw-rw-  1 codespace codespace  2710 Nov  4 09:14 20241104_091430_athenry.json
-rw-rw-rw-  1 codespace codespace  2710 Nov  4 09:14 20241104_091433_athenry.json
-rw-rw-rw-  1 codespace root        259 Oct 19 13:30 README.md
drwxrwxrwx+ 5 codespace codespace  4096 Nov 10 09:12 data
-rw-rw-rw-  1 codespace codespace    76 Oct 20 11:50 edits.md
-rw-rw-rw-  1 codespace codespace 10269 Nov 16 12:01 weather.ipynb
-rwxrw-rw-  1 codespace codespace   181 Nov 10 09:12 weather.sh
```
<li> The -all flag will show all files including files that are hidden using a . such as the .gitignore folder. 
<p>
<li>Tab can also be used on the keyboard to autofill file names for conveinience.
</p>
<li>The make directory command (mkdir) was then used to make the directories that I needed. 
The parent directory I needed was data so I made that first:

```bash
$ mkdir data
```


I then used cd to go into the data directory:

```bash
$ cd data/
/workspaces/computer_infrastructure_assessment/data
``` 


I then used mkdir to make weather and timestamps sub-repositories within the data repository:

```bash
$ mkdir weather
```

```bash
$ mkdir timestamps
````

 Another option I could have opted for was:
 ```bash
 $ mkdir =p data/{timestamps,weather}
 ```

The -p option speicifies you are making a parent directory for example I could have done all the preparation in one line using the above command. It is also important to note that the mkdir will create directories if they do not exist.

<small>

References: 
* [pwd Command](https://www.ibm.com/docs/tr/aix/7.1?topic=p-pwd-command) - This document from IBM helped me to articulate pwd and it's flags
* [Creating directories (mkdir command)](https://www.ibm.com/docs/en/aix/7.1?topic=directories-creating-mkdir-command) - This document explained the mkdir command. 
* [How to create a parent directory with multiple subdirectories...](https://stackoverflow.com/questions/72826946/how-to-create-a-parent-directory-with-multiple-subdirectories-which-has-nested-s) - This stack overflow artcile showed me how to make multiple sub directories using one command. 

<small>

# Task 2: Timestamps
* I started in the root of my directory and used the cd command to get into data/timestamps.
```bash
$ cd data/timestamps
/workspaces/computer_infrastructure_assessment/data/timestamps 
```
* I then used the following:
```bash
$ date >> now.txt
```

* The date function prints or sets the system date and time. It can be formatted and there is information on this running the command:

```bash
$ man date
```

* Ensuring to use q to quit. Date will generate the date and time in the following format `Sun Oct 27 17:52:11 UTC 2024`. UTC is the time zone it is coordinated universal time. Date contains various argumtents which one can use to modify or work with date more readily. For example:
```bash
$ date --help
$ date --h
```
* These will provide you with the documentaion in the terminal including useful information on date and time formatting.
* `>>` is the output redirection operator. Used in the context of our commands it takes the output of the date function which would normally just print to the terminal and outputs that value to a file we speicify. In this case now.txt. As we are using 2 greater than symbols `>>` it will append. This means that it does not overwrite the file which it would if we used `>`. Also if the file we specified in this case now.txt did not exist then the command line would create it. `<` is the input redirection operator and can allow a user to input certain files into commands. 
* With now.txt the txt file created I used the output redirection operator as append I was able to add more of the dates to this text file. I entered 10 dates as we performed this commmand 10 times.
<small>

References: 
* [date(1) — Linux manual page](https://man7.org/linux/man-pages/man1/date.1.html) - The linux manual has a comprehensive guide for using the date command. 
* [Input and output redirection](https://www.ibm.com/docs/en/aix/7.2?topic=administration-input-output-redirection)- This document from IBM explained the output and input redirection operator.
* [Difference between “>” and “>>” in Linux](https://www.shells.com/l/en-US/tutorial/Difference-between-%E2%80%9C%3E%E2%80%9D-and-%E2%80%9C%3E%3E%E2%80%9D-in-Linux#:~:text=So%2C%20what%20we%20learned%20is,to%20modify%20files%20in%20Linux.) - This article explained the output redirection operator
<small>

# Task 3: Formatting Timestamps
* This task was very similar to the previous task except we had to take into consideration the formatting of the dates. Using the `man` command listed above I was able to read reference information on the date command. man is very useful as you know you are being provided with the correct manual and it is conveinient that you can read it in the command line. 
```bash
$ date +"%Y%m%d_%H%M%S"
20241117_184801
``` 
* Is the format for the desired date format.
* I then used the output redirection operator:
```bash
$ date +"%Y%m%d_%H%M%S" >> formatted.txt
```
* This took my formatted date and created and inserted that date into the formatted.txt file.
* When formatting the date you need to ensure that you use the relevant `%`formatters with a `+"%"` format after the date command. 
* To achieve the format of YYYY as per the manual `%Y` keeping in mind that case sensitivity is crucial here as `%y` will provide the year in the YY format. 
* mm was achieved using `%m`
* dd was achieved with `%d` as `%D` would provide the date in %m%d%y example 10/27/24 
* HH was achieved with `%H` as `%h` would provide and abbreviation of the month name such as Oct. It is the same as %b.
* MM was achieved with `%M` as we saw above `%m` is for month.
* SS was achieved with `%S` as would provide the seconds since `1970-01-01 00:00:00 UTC` this was selected by the unix engineers at the time as a conveinient time and seconds was used to prevent short term overflow.

<small>

References: 
* The man date provided a lot of information on using the date command and formatting.
* [man Command](https://www.ibm.com/docs/es/aix/7.2?topic=m-man-command) - This IBM document enabled me to explain the man command. 
* [Unix Tick Tocks to a Billion](https://www.wired.com/2001/09/unix-tick-tocks-to-a-billion/) - This is an article about the %s and why 01-01-1970 was chosen.

<small>

# Task 4: Create Timestamped Files 
* This task required the use of the date in the format that we had already used for the not.txt file but required us to make it the name of the txt file we were creating. 
```bash
$ touch  "`date +"%Y%m%d_%H%M%S".txt`"
``` 
* touch is a command that changes file timestamps to the current time on the system. However, if the filename is not in the present working directory then the command line will create that file therefore it is also used to create files. 
* There are arguments that you can use to have greater control with the touch command for example:
```bash
"touch -c "Filename"
``` 
* Will not create so it will ensure that no new file is created. 

```bash
"touch -t STAMP "Filename"
``` 
* Allows a user to specify the timestamp instead of the current time on the system. 
* Stamp must be in the following format: `[[CC]YY]MMDDhhmm[.ss]`

* CC: The first two digits of the year (optional).
* YY: The last two digits of the year.
* MM: Month (01–12).
* DD: Day of the month (01–31).
* hh: Hour (00–23).
* mm: Minutes (00–59).
* ss: Seconds (00–59, optional).

* The backtick `"``"` ensures that whatever command is within the backticks is executed first before the command in the line. This allows us to generate the date first in the format that we have specified with added .txt at the end and then the touch command is run.

* This is an alternative to using the output redirection operator and enables us to create an empty txt file with the date. 
<small>

References: 
* man touch was where I obtained infromation on the touch command and some of the arguments that you can provide
* [What does ` (backquote/backtick) mean in commands?](https://unix.stackexchange.com/questions/27428/what-does-backquote-backtick-mean-in-commands)- this forum provided me with information and examples as how to use backticks.

<small>

# Task 5: Download Today's Weather Data
* This task required the use of `wget` to download a file that would be called "weather.json". 
* `wget` allows for downloading of files from the internet into your directory. 
* `wget` supports downloading via FTP(file transfer protocol), SFTP(secure file transfer protocol), HTTP(hypertext transfer protocol) and HTTPS(hypertext transfer protocol secure)
* `-O` is the option that we used with `wget` to name our file that we downloaded to `"weather.json"`
* The file is in a format called JSON. This stands for JavaScript Object Notation. It is a format for storing and transporting data. 
* It is commonly used to send data from servers to webpages. It is self describing and relativly readable. See an example below:  
```
{
"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]
}
```
* Benefits of JSON include its flexible formatting, wide usablilty between systems and langages, and its easy-to-parse data format.
* The data was obtained from an API (Application Programming Interface)
* APIs are software intermediary that allows two applications to communicate. In this case it is my code spaces vitrual machine requesting data from Met Éireann's server or database. 
* The URL: `https://prodapi.metweb.ie/observations/athenry/today.` specifies where to send the request (the address) and what to request(weahter from athenry today).
* When running the wget command a request is made to the API to provide the file and the API responds with the data in a JSON file format.
* Some APIs require a key for added authentication.
* Therefore we now see how we downloaded the JSON file in full and understand all the commands.
```bash
$ wget -O "weather.json" https://prodapi.metweb.ie/observations/athenry/today.
```
<small>

References: 
* [What is the WGET Command](https://www.hostinger.com/tutorials/wget-command-examples/#:~:text=Wget%20is%20a%20command%2Dline,SFTP%2C%20HTTP%2C%20and%20HTTPS.) - This site explained wget and showed some useful examples.
* [What is JSON](https://www.w3schools.com/whatis/whatis_json.asphttps://www.w3schools.com/whatis/whatis_json.asp)- This article introduced JSON
* [What is JSON](https://www.oracle.com/ie/database/what-is-json/#:~:text=JSON%20is%20popular%20with%20developers,and%20interpret%20the%20data%20provided.) - This article by oracle went into more detail and provided the JSON benefits.
* [What is an API](https://www.mulesoft.com/api/what-is-an-api)- Article from MuleSoft explaining APIs

<small>



# Task 6: Timestamp the Data
* This task required modification of task 5 to save the downloaded file with a timestamped name in the format YYYYmmdd_HHMMSS.json.
* This was just a combination of the above:
```bash
$ wget -O "weather.json" https://prodapi.metweb.ie/observations/athenry/today.
```
and 
```bash
$ touch  "`date +"%Y%m%d_%H%M%S".txt`"
``` 
to create:
```bash
$ wget -O "`date +"%Y%m%d_%H%M%S".json`" https://prodapi.metweb.ie/observations/athenry/today
2024-11-17 19:55:26 (780 MB/s) - ‘20241117_195525.json’ saved [6014]
```
and returning the above JSON file as specified.


# Task 7: Write the Script
* This task required taking the command above and automating it by putting it into a script in the root of the repository called `weather.sh` The script also had to be made executable and tested.
* I ensured I was in the root of the directory by 
```bash
$ cd ..
/workspaces/computer_infrastructure_assessment
```
* I then created the file `weather.sh` I used the gui functionality in VS code but could have done:
```bash
$ touch "weather.sh"
```
* Once in `weather.sh` it is important to incude the shebang. The shebang allows one to specify which interpreter to parse the script with and must be the first line of the script. 
* The shebang for bash is: 
``` 
 #! /bin/bash
```
* Next using the exisiting command from task 6 we were able to creat our script:
```bash
date
echo "Downloading weather data"
wget -O data/weather/`date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today
echo "Weather data download"
date
```
* In this script the main part is the wget command but I also added `date` followed by `echo "Downloading weather data"` this is simply to show the time that the download bega.
* Once the json is downloaded then there is another `echo "Weather data downloaded"` this is to confirm that the download is done.
* `echo` is the equivalent of `print` in python it prints a message to the terminal. 
* The script is now ready to test. To run a bash script you must be in the same directory as the script.
```bash 
$ pwd
/workspaces/computer_infrastructure_assessment
```
* Then you use:
```bash
$ ./weather.sh
```
* However whenever I intially did this I got a "permission denied". 
* Looking at my files in long form :
```bash
$ ls -all
-rw-rw-rw-  1 codespace codespace     0 Nov 17 21:11 weather.sh
```
* The -rw-rw-rw means that this script is readable and writable but it is not exectuable. Therefore bash does not know if can run this file.
* So I then ran 
```bash
$ chmod u+x ./weather.sh
$ ls -all
-rwxrw-rw-  1 codespace codespace   187 Nov 17 20:07 weather.sh
```
* Now the script is executable and was able to run. One can also see that `rwx` which means that it is executable.
* VS code will also turn the weather.sh into green font.
* chmod is a command short for change mode which allows for changing of access permissions and special flags
* By using u+x one is telling the terminal that this user is adding the permission to execute this file.


<small>

References: 
* [How to Write a Bash Script: A Simple Bash Scripting Tutorial](https://www.datacamp.com/tutorial/how-to-write-bash-script-tutorial?utm_source=google&utm_medium=paid_search&utm_campaignid=19589720824&utm_adgroupid=157156376111&utm_device=c&utm_keyword=&utm_matchtype=&utm_network=g&utm_adpostion=&utm_creative=720362650447&utm_targetid=aud-1704732079567:dsa-2218886984300&utm_loc_interest_ms=&utm_loc_physical_ms=9195702&utm_content=&utm_campaign=230119_1-sea~dsa~tofu_2-b2c_3-row-p2_4-prc_5-na_6-na_7-le_8-pdsh-go_9-nb-e_10-na_11-na-bfcm24&gad_source=1&gclid=CjwKCAiAxea5BhBeEiwAh4t5K2iLrKDrW104RKtSaokVsdevveoZ-9msHun6XhanyUbDG3qrwiO5bxoC8IEQAvD_BwE)- A helpful guide on how to write a Bash script from datacamp.
* [Shell scripts and other related concepts](https://hbctraining.github.io/Intro-to-shell-flipped/lessons/05_shell-scripts_variable.html#:~:text=The%20echo%20command%20is%20used,screen%20or%20to%20a%20file.)- Github article on shell scripts provided explanation of echo.

* [How do I execute a bash script in Terminal?](https://stackoverflow.com/questions/2177932/how-do-i-execute-a-bash-script-in-terminal)- stack overflow on how to run a script in bash.

* [chmod](https://en.wikipedia.org/wiki/Chmod)- Article explaining chmod.
<small>

# Task 9: pandas
* The final task requires using pandas to read one of the JSON files downloaded in the `weather.sh` script to examine and summarise the data. Using [date.gov.ie] to write an explanation of the data set contents.

In [None]:
# Import what we will need
import pandas as pd 
from Ipython.display import display


In [4]:
# Lets load one of the files we created.
df = pd.read_json("data/weather/20241117_200757.json")
# Lets take a look at it.
df.head()


Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,6,04n,Cloudy,"""Cloudy""",9,-,W,270,89,0.0,1021,Sunday,2024-11-17,00:00
1,Athenry,7,04n,Cloudy,"""Cloudy""",4,-,W,270,90,0.0,1020,Sunday,2024-11-17,01:00
2,Athenry,7,04n,Cloudy,"""Cloudy""",7,-,W,270,89,0.0,1020,Sunday,2024-11-17,02:00
3,Athenry,7,04n,Cloudy,"""Cloudy""",6,-,W,270,90,0.0,1020,Sunday,2024-11-17,03:00
4,Athenry,8,04n,Cloudy,"""Cloudy""",6,-,W,270,90,0.0,1019,Sunday,2024-11-17,04:00


In [5]:
# Lets see all the columns 
df.columns

Index(['name', 'temperature', 'symbol', 'weatherDescription', 'text',
       'windSpeed', 'windGust', 'cardinalWindDirection', 'windDirection',
       'humidity', 'rainfall', 'pressure', 'dayName', 'date', 'reportTime'],
      dtype='object')

In [6]:
# Lets count the number of entries 
len(df)

20

* The weather variables for each hour that are obtained from this weather station are as follows:

|Variables                |
|-----------              |
|name                     |
|temperature              |
|weather description      |
|windspeed(kt)            |
|cardinal wind direction  |
|Relative humidity (%)    |
|Rainfall (mm)            |
|msl Pressure (mbar)      |
|Day of the week          |
|Date                     |
|Time of observation      |   

In [7]:
# Describe will allow us to gain som initial insights into the data
df.describe()

Unnamed: 0,temperature,windSpeed,windDirection,humidity,rainfall,pressure,date
count,20.0,20.0,20.0,20.0,20.0,20.0,20
mean,9.05,8.3,267.75,89.9,0.0605,1017.15,2024-11-17 00:00:00
min,6.0,4.0,225.0,85.0,0.0,1014.0,2024-11-17 00:00:00
25%,8.0,7.0,270.0,87.0,0.0,1015.0,2024-11-17 00:00:00
50%,9.5,7.0,270.0,90.0,0.0,1017.0,2024-11-17 00:00:00
75%,10.0,9.5,270.0,92.0,0.0025,1019.0,2024-11-17 00:00:00
max,11.0,13.0,270.0,97.0,0.5,1021.0,2024-11-17 00:00:00
std,1.571958,2.451637,10.062306,3.338768,0.142699,2.230766,


In [9]:
# Lets look at the temperature information from Sunday 17th November
max_temp = df["temperature"].max()
min_temp = df["temperature"].min()
max_temp_row = df.iloc[df["temperature"].idxmax()]
min_temp_row = df.iloc[df["temperature"].idxmin()]
max_temp_time = max_temp_row["reportTime"]
min_temp_time = min_temp_row["reportTime"]
location = df["name"].iloc[0]
date = df[["dayName", "date"]].iloc[0]  
date_str = f"{date['dayName']} {date['date']}"  # Combine dayName and date into a readable string

print(f"The maximum temperature achieved on {date_str} at {location} was {max_temp} at {max_temp_time}. The minimum temperature was {min_temp} at {min_temp_time}.")


The maximum temperature achieved on Sunday 2024-11-17 00:00:00 at Athenry was 11 at 12:00. The minimum temperature was 6 at 00:00.


In [22]:
# Another analysis that would be useful is to create a daily summary of the data. We could in theory use this to compare other days data.
daily_summary = {"Average Temperature (°C)" : df["temperature"].mean(),
                    "Max Temperature (°C)" : df["temperature"].max(),
                    "Min Temperature (°C)" : df["temperature"].min(),
                    "Total Precipitation (mm)" : df["rainfall"].sum(),
                    "Average Windspeed (kt)" : df["windSpeed"].mean(),
                    "Most Common Wind Direction (°)" : df["windDirection"].mode(),
                    "Average Humidity (%)" : df["humidity"].mean()}
format_date_str = date_str.split()[1]
daily_summary_df = pd.DataFrame(daily_summary, index=[format_date_str])  

display(daily_summary_df)


Unnamed: 0,Average Temperature (°C),Max Temperature (°C),Min Temperature (°C),Total Precipitation (mm),Average Windspeed (kt),Most Common Wind Direction (°),Average Humidity (%)
2024-11-17,9.05,11,6,1.21,8.3,,89.9


### End
