# Computer Infrastructure Module Assessments.

This Jupyter notebook contains a set of tasks that are part of the Computer Infrastructure module assessment. Below are listed the tasks and their solutions.

To efficiently complete tasks, understanding the directory structure and knowing how to navigate between directories is essential.

Here are some fundamental terminal navigation commands:

`ls`: used to display directories and files in the current directory.

`cd [directory]`: used to switch to the specified directory.

`cd ..`: used to move up to the parent directory.

`pwd`: used to show the full path of the current working directory.

`mkdir [directory_name]`: used to create a new directory with the expected  name.

For detailed information about any of these commands, use man [command_name].

### Task 1: Create Directory Structure

Using the command line, create a directory (that is, a folder) named `data` at the root of your repository. Inside `data`, create two subdirectories: `timestamps` and `weather`.

#### Solution:

It's crucial to be in the root of your directory before creating new ones. The `mkdir` command is used to build a directory structure. For guidance on using this command, refer to `man mkdir`. The manual for `mkdir` is also available at man7.org.

There are two approaches to creating a directory structure:

1. Step-by-Step Creation:

    To create a data directory, use: 
    
    ```bash
    mkdir data
    ```
    
    Then, switch to this directory using: 
    
    ```bash
    cd data
    ```

    Create subdirectories named timestamps and weather:

    ```bash
    mkdir timestamps
    mkdir weather
    ```

2. Single Command Creation:

    Use the following commands to create directories and subdirectories in one go:

    ```bash
    mkdir -p data/timestamps
    mkdir -p data/weather
    ```

    The -p, --parents option is advantageous because it allows the creation of a new directory structure    without the concern of existing directories.

### Task 2: Timestamps

Navigate to the `data/timestamps` directory. Use the `date` command to output the current date and time, appending the output to a file named `now.txt`. Make sure to use the `>>` operator to append (not overwrite) the file. Repeat this step ten times, then use the more command to verify that `now.txt` has the expected content.

#### Solution:

To navigate from the root of the repository, use the following:

```bash
cd data
```

and then 

```bash
cd timestamps
```

Another faster way  is to use: 

```bash
cd data/timestamps
```

To realise the task, use the command `date`, which prints or sets the system date and time. The syntax is:

```bash
date >> now.txt
```

This command appends the current date to the now.txt file.
Then, repeat this step ten times. This can be done manually by executing the previous command ten times.<br> Alternatively, a ***for*** loop can be used, where the user can declare how many times the loop will be executed.<br>

Here is the syntax to realise the automatisation of this task:

```bash
for i in {1..10}; do date >> now.txt; sleep 10; done
```

`sleep` is represented by the value in seconds (for example, 1 as 1 second), minutes (for example, 2m as 2 minutes,) hours (for example, 1 has 1 hour), or days (for example, 1d as 1 day).

To find out more about the `for` loop, visit [Cyberciti website - Bash For Loop Examples](https://www.cyberciti.biz/faq/bash-for-loop/). To better understand the `sleep` 
command, go to [Cyberciti website - Linux / UNIX Bash Script Sleep or Delay a Specified Amount of Time](https://www.cyberciti.biz/faq/linux-unix-sleep-bash-scripting/).
Please notice that if the user does not use the `sleep` command all records in the now.txt file will have the same value. Users can set time intervals by using the `sleep` 
command helps decide how often the date will be appended to the text file.


To see the result of the execution of this code, use the `more` command in this case: 

```bash
more now.txt
```

Visit [Man7 website](https://man7.org/linux/man-pages/man1/more.1.html) for more information about the `more` command.

### Task 3: Formatting Timestamps

Run the `date` command again, but this time format the output using YYYYmmdd_HHMMSS (e.g., 20261114_130003 for 1:00:03 PM on November 14, 2026). Refer to the `date` man page (using `man date`) for more formatting options. (Press q to exit the man page). Append the formatted output to a file named `formatted.txt`.

#### Solution:

Ensure you are in the correct directory by using the `pwd` command to verify the path.

To complete the task, refer to the manual for the `date` command with `man date`. 
This will provide guidance on formatting the output to the YYYYmmdd_HHMMSS form. Accoding to `man date` 
%Y is related to a year, %m is associated with a month, %d is related to a day, %H is related to an 
hour, %M is related to a minute, and %S is related to a second. 
The syntax is:

`date +"%Y%m%d_%H%M%S"`

To append the previously formatted date to the text file formatted.txt, use the command:

```bash
date +"%Y%m%d_%H%M%S" >> formatted.txt
```

Notice: make sure to use `>>` instead of `>` because using `>` overwrites existing data inside the formatted.txt file.

### Task 4: Create Timestamped Files

Use the `touch` command to create an empty file with a name in the `YYYYmmdd_HHMMSS.txt` format. You can achieve this by embedding your date command in backticks \` into the `touch` command. You should no longer use redirection (`>>`) in this step.

#### Solution:

The `touch` command serves two purposes: creating a new empty file if it does not already exist or updating the last modified timestamp of an existing file. To understand the specifics of using the touch command, consult the manual using man touch. In this case, the Bash interpreter treats the output of the date command as the filename by placing it between backticks.

Here is the syntax:

```bash
touch `date +"%Y%m%d_%H%M%S".txt`
```





Executing this command will create an empty text file with a name that reflects the date and time of its creation.

### Task 5: Download Today's Weather Data

Change to the `data/weather` directory. Download the latest weather data for the Athenry weather station from Met Eireann using `wget`. Use the ***-O filename*** option to save the file as `weather.json`. The data can be found at this URL:
https://prodapi.metweb.ie/observations/athenry/today.

#### Solution:

Navigate to be in the proper directory. If not, change the directory to data/weather.

The `wget` command is used to download data from the network in the text-based interface.

Use the command:

```bash
man wget
```

to learn more about how to use it.

The syntax to realise the task is:

```bash
wget -O weather.json https://prodapi.metweb.ie/observations/athenry/today
```

where -O is the user to provide the name of the file.

### Task 6: Timestamp the Data

Modify the command from Task 5 to save the downloaded file with a timestamped name in the format YYYYmmdd_HHMMSS.json.

#### Solution:

Like previously, make sure you are in the proper directory. It is possible to save data into expected directory giving a path to the location.

Then, using knowledge from previous tasks about backticks and the `wget` command, we can download 
data and save it in timestamp format to the json file. 

The syntax is:
```bash
wget -O `date +"%Y%m%d_%H%M%S".json` https://prodapi.metweb.ie/observations/athenry/today
```

### Task 7: Write the Script

Write a bash script called weather.sh in the root of your repository. This script should automate the process from Task 6, saving the weather data to the data/weather directory. 
Make the script executable and test it by running it.

#### Solution:

First, make sure you're in the root directory of your repository.
You can create the weather.sh file using your preferred text editor. Visual Studio Code is the best possible option for a feature-rich interface, or you can use terminal-based editors like Nano or Vi if you're in a bash environment.

Inside the file, add the following:
```bash
#! /bin/bash

wget -O data/weather/`date +"%Y%m%d_%H%M%S".json` https://prodapi.metweb.ie/observations/athenry/today
```
The first line is a shebang `#!`, which is the symbol used at the beginning of a script file to indicate which interpreter should be used to execute the script. The script will be run using the bash shell. Other lines of code, such as the code from task 6, can follow that.

To make the file executable, use the chmod command:
```bash
chmod u+x weather.sh
```
This command changes the file permissions to make it executable. For more details about the `chmod` command simple type `man chmod`.

### Task 8: Notebook

Create a notebook called weather.ipynb at the root of your repository. In this notebook, write a brief report explaining how you completed Tasks 1 to 7. Provide short 
descriptions of the commands used in each task and explain their role in completing the tasks.

#### Solution:

The result of the current task is the Jupyter Notebook weather.ipynb, which the user already has open.  Inside, brief reports explain how Tasks 1 to 7 were completed, with detailed information provided in the task solutions.

### Task 9: pandas
In your weather.ipynb notebook, use the pandas function read_json() to load in any one of the weather data files you have downloaded with your script. Examine and summarize the 
data. Use the information provided data.gov.ie to write a short explanation of what the data set contains.

#### Solution:

To complete this task, importing pandas into the Jupyter Notebook is necessary.

In [1]:
# Import the library.
import pandas as pd

Then, the function `read_json` is used to read data from the selected JSON file.

In [2]:
# Read the data from the json file.
data = pd.read_json('data/weather/20241121_233356.json')

The `head()` function is used to find basic information about the data inside the JSON file. The function provides the first few rows of the dataset to check the data structure and contents quickly.

In [3]:
# Examine the data.
data.head(5)

Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,1,49n,Light snow,"""Light snow""",20,-,E,90,88,0.2,1006,Thursday,2024-11-21,00:00
1,Athenry,1,08n,Snow showers,"""Snow shower""",17,-,E,90,88,0.01,1005,Thursday,2024-11-21,01:00
2,Athenry,1,49n,Light snow,"""Light snow""",20,-,NE,45,94,0.6,1004,Thursday,2024-11-21,02:00
3,Athenry,0,49n,Light snow,"""Light snow""",20,-,E,90,97,0.8,1004,Thursday,2024-11-21,03:00
4,Athenry,0,49n,Light snow,"""Light snow""",15,-,NE,45,98,1.0,1004,Thursday,2024-11-21,04:00


The `info()` function gives a summary of the dataset, including the data types of each column and non-null counts.

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   name                   24 non-null     object        
 1   temperature            24 non-null     int64         
 2   symbol                 24 non-null     object        
 3   weatherDescription     24 non-null     object        
 4   text                   24 non-null     object        
 5   windSpeed              24 non-null     int64         
 6   windGust               24 non-null     object        
 7   cardinalWindDirection  24 non-null     object        
 8   windDirection          24 non-null     int64         
 9   humidity               24 non-null     int64         
 10  rainfall               24 non-null     float64       
 11  pressure               24 non-null     int64         
 12  dayName                24 non-null     object        
 13  date   

Conclusion:

The `data.info()` output reveals several important aspects of the dataset:

Total Entries: There are 23 entries.

Columns: 
The dataset includes 15 columns with different data types:
* 8 columns are of type object (e.g., name, symbol, weatherDescription).
* 5 columns are of type int64 (e.g., temperature, windSpeed, windDirection).
* 1 column is of type float64 (rainfall).
* 1 column is of type datetime64[ns] (date).

Non-Null Values: All columns contain 23 non-null values, indicating there are no missing values.

The summary offers a straightforward description of how the dataset is structured and what information it contains.

The `isnull().sum()` functions identifies and counts missing values in the dataset.

In [5]:
# Identify the missing values.
data.isnull().sum()

name                     0
temperature              0
symbol                   0
weatherDescription       0
text                     0
windSpeed                0
windGust                 0
cardinalWindDirection    0
windDirection            0
humidity                 0
rainfall                 0
pressure                 0
dayName                  0
date                     0
reportTime               0
dtype: int64

There are no missing values in the data set.

The `describe()` function offers descriptive statistics for numerical columns, such as mean, median, and standard deviation.

In [6]:
# Examine the data.
data.describe()

Unnamed: 0,temperature,windSpeed,windDirection,humidity,rainfall,pressure,date
count,24.0,24.0,24.0,24.0,24.0,24.0,24
mean,1.083333,11.666667,157.5,91.041667,0.229583,1005.75,2024-11-21 00:00:00
min,0.0,4.0,0.0,78.0,0.0,1004.0,2024-11-21 00:00:00
25%,0.0,9.0,45.0,88.0,0.0,1004.0,2024-11-21 00:00:00
50%,1.0,11.0,157.5,92.0,0.0,1006.0,2024-11-21 00:00:00
75%,1.0,15.0,270.0,94.0,0.125,1007.0,2024-11-21 00:00:00
max,4.0,20.0,315.0,99.0,1.5,1008.0,2024-11-21 00:00:00
std,1.138904,4.48831,123.058841,5.034483,0.453254,1.452135,


Conclusion:

Looking at the statistical summary of the numerical columns in the weather dataset, a few points about the weather patterns can be inferred:
* Temperature: The average temperature is 22.7°C, with a minimum of 22°C and a maximum of 25°C. A standard deviation of 2.1 indicates some temperature fluctuations.
* Wind Speed: The average wind speed of 13.4 km/h, with speeds ranging from 7 km/h to 17 km/h. The standard deviation of 3.5 suggests moderate variability.
* Wind Direction: On average, the wind comes from 133 degrees, ranging from 90 to 180 degrees, showing diverse wind patterns. The standard deviation is 21.4.
* Humidity: Humidity levels are quite consistent, with an average of 89.7% and ranges from 82% to 96%. The standard deviation is fairly low at 4.3%.
* Rainfall: The average rainfall is 0.2 mm, though heavier precipitation can reach up to 2.2 mm. The standard deviation is 0.5.
* Pressure: The average atmospheric pressure is 1005.2 hPa, with values ranging from 998 to 1012 hPa. The standard deviation is 4.4.

These observations offer a comprehensive dataset overview, highlighting trends and variations in the observed weather conditions.

***

### The project.

The goal of this project is to automate the daily execution of the weather.sh script and push the updated data to the user's computer-infrastructure repository.

The following steps will create the necessary GitHub Actions workflow:

1. [Create a GitHub Actions Workflow: ](#solution-of-point-1)<br>
In your repository, create a folder called .github/workflows/ (if it doesn't already exist). Inside this folder, create a file called weather-data.yml. The document is dedicated to the GitHub Actions workflow.
2. [Run Daily at 10 am:](#solution-of-the-point-2)<br>Use the schedule event with cron to set the script to run once a day at 10 am. Include also the workflow_dispatch event so you can test the workflow.
3. [Use a Linux Virtual Machine: ](#solution-of-point-3)<br> In the workflow file, specify that a Ubuntu virtual machine should be used to run the action.
4. [Clone the Repository: ](#solution-of-point-4)<br>Have the workflow clone your repository.
5. [Execute the weather.sh script: ](#solution-of-point-5)<br>Add a step that runs your weather.sh script.
6. [Commit and Push Changes Back to the repository: ](#solution-of-point-6)<br>Finally, configure the workflow to commit the new weather data and push those changes back to your repository.
7. [Test the Workflow: ](#solution-of-point-7)<br>Commit and push the workflow to your repository. Check the GitHub logs to ensure that the weather.sh script runs correctly, and new data is being committed.

To create the expected YAML file, ChatGPT was used with the following prompt:

"Please generate a GitHub Actions workflow YAML file named Weather Data Processing. The workflow should run every day at 10 AM UTC and can also be triggered manually. The workflow should include a job that clones the repository and runs a weather.sh script, and commits and pushes any changes made."

The YAML file, as outlined in the [GitHub Docs](https://docs.github.com/en/actions/about-github-actions/understanding-github-actions), has the following structure:
* Events: A specific activity in a repository that triggers a workflow run.
* Runners: A runner is a server that executes your workflows when triggered. It can run one job at a time. GitHub offers runners for Ubuntu, Windows, and macOS. Each workflow runs in a fresh virtual machine.

Under the Runner section is the job, defined as a sequence of steps executed on the same runner. Each step is either a shell script or an action, and steps run in order, depending on each other. Since they share the same runner, data can be passed between steps (e.g., building an app in one step and testing it in the next). Jobs run in parallel by default, but you can set job dependencies so that one job waits for another to finish before starting.

### Solution of point 1:

Definition of workflow based on [GitHub website](https://docs.github.com/en/actions/about-github-actions/understanding-github-actions#workflows):

A workflow is an automated, configurable process designed to execute one or more tasks. It is specified using a YAML file stored in your repository. Workflows can be activated by various triggers, such as events within the repository, manual initiation, or a predetermined schedule.

Workflows are located in the .github/workflows directory of a repository. Each repository can host multiple workflows designed to carry out a distinct set of operations.


The syntax is:

```bash
mkdir -p .github/workflows
touch .github/workflows/weather-data.yml
```

### Solution of the point 2:

##### The YAML Workflow syntax for GitHub Actions.

To edit `weather-data.yml` use favoutive editor such as VS Code.

* `name` (not required) defined as name of the workflow. More details about it can be found on [GitHub Docs](https://docs.github.com/en/enterprise-server@2.22/actions/learn-github-actions/workflow-syntax-for-github-actions#name).

The syntax is:
```YAML
name: Weather Data Processing
```

* `on` (required) is the name of the GitHub event triggers the workflow. A single event, a list of events, a list of event types, or a configuration map can be given. This map can schedule a workflow or limit the workflow to run on specific files, tags, or branch changes.

In the project `on.schedule` is used to schedule a workflow to be run at specific UTC times. Visit 
[GitHub](https://docs.github.com/en/enterprise-server@2.22/actions/learn-github-actions/workflow-syntax-for-github-actions#onschedule) to find out more about the schedule.<br>
It is also used `workflow_dispatch` to trigger workflow runs manually. For more information, look into 
[GitHub Docs](https://docs.github.com/en/enterprise-server@2.22/actions/learn-github-actions/events-that-trigger-workflows#workflow_dispatch).


The syntax is:
```YAML
on:
  schedule:
    - cron:  '0 10 * * *'
  workflow_dispatch:
```

Cron syntax consists of five fields, representing a time unit with spaces between values.

* The first dot represents minute (0 - 59)
* The second dot represents the hour (0 - 23)
* The third dot represents a day of the month (1 - 31)
* The fourth dot represents the month (1 - 12 or JAN-DEC)
* The fifth dot represents a day of the week (0 - 6 or SUN-SAT)


These operators can be used in any of those five fields:

Operator Description Example

`*` Any value   10 * * * * runs at every minute 10 of every hour of every day.<br>
`,` Value list separator    1,11 3,4 * * * runs at minute 1 and 11 of the 3rd and 4th hour of every day.<br>
`-` Range of values 35 2-6 * * * runs at minute 35 of the 2nd, 3rd, 4th, 5th, and 6th hour.<br>
`/` Step values 30/12 * * * * runs every 12 minutes starting from minute 30 through 59 (minutes 30, 42, and 54).

The above information comes from: Events that trigger workflows - [GitHub Docs](https://docs.github.com/en/enterprise-server@2.22/actions/learn-github-actions/events-that-trigger-workflows#schedule).

### Solution of point 3:

To use a GitHub-hosted runner, define a job and set the `runs-on` key to specify the runner type, in this case `ubuntu-latest`. Based on [Stack Overflow website](https://stackoverflow.com/questions/69840694/what-does-ubuntu-latest-mean-for-github-actions#:~:text=ubuntu%2Dlatest%20refers%20to%20an%20image%20of%20the%20latest%20version%20of%20ubuntu%20(a%20linux%20operating%20system)%20which%20can%20be%20used%20to%20run%20GitHuB%20Actions.%20It%20comes%20with%20a%20lot%20of%20other%20software%20preinstalled%20and%20ready%20to%20use.) `ubuntu-latest` is an image of the latest Ubuntu version, preloaded with various software for running GitHub Actions.

When the job starts, GitHub provisions a new VM. All steps run on this VM, allowing them to share data via the filesystem. Once the job is complete, the VM is automatically decommissioned.


The syntax is:
```YAML
run_weather_script:
  runs-on: ubuntu-latest
```

### Solution of point 4:

The syntax is:
```YAML
- name: Checkout repository
  uses: actions/checkout@v3
```
This syntax should be located in the section `steps:` inside the `jobs:`.

During manual workflow test the followng warning was noticed. "The following actions use a deprecated Node.js version and will be forced to run on node20: actions/setup-node@v3, actions/checkout@v3."
By updating to actions/checkout@v4, the deprecation warning disappears and benefits from the latest features and fixes.

A new syntax is:
```YAML
- name: Checkout repository
  uses: actions/checkout@v4
```

### Solution of point 5:

The syntax is:
```YAML
- name: Run weather.sh script
  run: |
    chmod +x weather.sh
    ./weather.sh
```
This syntax should be located in the section `steps:` inside the `jobs:` as another step under the "Checkout the repository" step.

### Solution of point 6:

The syntax is:
```YAML
- name: Commit and push changes
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "github-actions[bot]@users.noreply.github.com"
    git add .
    git commit -m "Run daily stocks script" || echo "No changes to commit"
    git push
```
Make sure that permissions are given to commit and push changes to the repository by adding code structure:
```YAML
permissions:
  contents: write
```




### Solution of point 7: