**Computer Infrastructure - Assessment** 
 
**Author: Carlos Rigueti**

This Jupyter Notebook serves as documentation for the Computer Infrastructure assessment in the Higher Diploma in Data Analytics course at ATU. It describes the process of completing tasks related to directory organization, timestamp handling, and automated weather data retrieval through command-line tools and Bash scripting. The report elaborates on the technical commands employed, their functions, and their integration into an automated workflow, showcasing practical system management and data processing techniques.

---


**Task 1: Create Directory Structure**  
**Objective:** Organize files in a logical directory hierarchy for efficient data storage and access.

**Steps:**
- Created a root folder named "data" for this project.
- Created two subdirectories, "timestamps" and "weather," to organize time-related and weather-related data.

**Commands:**
```bash
mkdir -p data/timestamps data/weather
```

**Explanation:**  
The `mkdir` command with the `-p` flag allows for the simultaneous creation of parent directories and nested subdirectories.

**Task 2: Capturing Timestamps**  
**Objective:** Log the current date and time into a file repeatedly to simulate time-based data collection.

**Steps:**
- Navigated to the "data/timestamps" directory.
- Used the `date` command to capture the current date and time and append it to "now.txt."
- Repeated the process ten times, ensuring that all timestamps were stored without overwriting.

**Commands:**
```bash
cd data/timestamps
date >> now.txt
```

**Explanation:**  
The `>>` operator appends output to the "now.txt" file, preserving existing data.


**Task 3: Formatting Timestamps**  
**Objective:** Generate timestamps in a readable and sortable format suitable for file naming.

**Steps:**
- Used the `date` command with a custom format string to produce timestamps in the format `YYYYmmdd_HHMMSS`.
- Appended the formatted timestamps to a file called "formatted.txt."

**Commands:**
```bash
date "+%Y%m%d_%H%M%S" >> formatted.txt
man date
```

**Explanation:**  
The command `date "+%Y%m%d_%H%M%S"` customizes the date output. Referencing `man date` displays formatting options.

**Task 4: Creating Timestamped Files**  
**Objective:** Automate the creation of files with timestamp-based names for event tracking.

**Steps:**
- Combined the `touch` command with the `date` command to dynamically generate and create uniquely named files.

**Commands:**
```bash
touch "$(date "+%Y%m%d_%H%M%S").txt"
```

**Explanation:**  
Using `$(...)`, the output of the `date` command is evaluated and integrated into the filename.

**Task 5: Downloading Today’s Weather Data**  
**Objective:** Retrieve real-time weather data from an online source and save it locally.

**Steps:**
- Navigated to the "data/weather" directory.
- Used the `wget` command to download weather data and save it as "weather.json."

**Commands:**
```bash
wget -O weather.json https://prodapi.metweb.ie/observations/athenry/today
```

**Explanation:**  
The `-O` option specifies the output file name as "weather.json."

**Task 6: Timestamping Weather Data**  
**Objective:** Save downloaded weather data with unique, timestamped filenames to prevent overwriting.

**Steps:**
- Modified the `wget` command to include a dynamically generated timestamp in the filename.

**Commands:**
```bash
wget -O "weather_$(date "+%Y%m%d_%H%M%S").json" https://prodapi.metweb.ie/observations/athenry/today
```

**Explanation:**  
By embedding the `date` command, unique filenames ensure that data from different timestamps are preserved.

**Task 7: Writing a Bash Script**  
**Objective:** Automate the process of downloading and saving timestamped weather data with a reusable script.

**Steps:**
- Created a Bash script named "weather.sh."
- Wrote a script that performs all steps from Task 6.
- Made the script executable and tested its functionality.

**Commands:**
```bash
nano weather.sh
chmod u+x weather.sh
./weather.sh
```

**Script Content:**
```bash
#!/bin/bash
TARGET_DIR="data/weather"
mkdir -p "$TARGET_DIR"
FILENAME="$TARGET_DIR/weather_$(date "+%Y%m%d_%H%M%S").json"
wget -O "$FILENAME" https://prodapi.metweb.ie/observations/athenry/today
echo "Weather data saved as $FILENAME"
```

**Explanation:**  
This script automates directory creation, timestamp generation, and file downloading.


**Task 8: Notebook for Reporting**  
**Objective:** Document all tasks with explanations and outputs in a Jupyter Notebook.

**Steps:**
- Created a file named "weather.ipynb."
- Provided descriptions of commands and steps for Tasks 1 to 7, including code snippets and outputs.

**Task 9: Data Analysis with Pandas**  
**Objective:** Load and analyze downloaded weather data using Python’s Pandas library.

**Steps:**
- Loaded a weather data file using `read_json()`.
- Examined the dataset structure, provided summaries, and analyzed key statistics.

**Code Example:**
```python
import pandas as pd

# Load the JSON file
weather_df = pd.read_json('data/weather/weather_20241118_093936.json')

# Display the first few rows
print(weather_df.head())

# Summary of the data structure
print(weather_df.info())

# Statistical overview
print(weather_df.describe())
```

**Output:**  
The dataset provides insights into weather conditions, including temperature, wind speed, humidity, and rainfall.

**Conclusion**  
This assessment highlights the integration of practical system management techniques with data processing tasks, demonstrating effective methods for organizing, collecting, and analyzing data.

### Tasks weather analysis:


In [None]:
#Data frame
import pandas as pd


In [19]:
# Load the JSON file
weather_df = pd.read_json('data/weather/20241118_093936.json')

In [20]:
# Summarize the data
print(weather_df.head())       # Display first 5 rows

      name  temperature symbol  weatherDescription               text  \
0  Athenry            8    46n         Light rain       "Light rain "   
1  Athenry            8    46n         Light rain   "Recent Drizzle "   
2  Athenry            7    05n        Rain showers      "Rain shower"   
3  Athenry            7    40n  Light rain showers      "Recent Rain"   
4  Athenry            7    15n          Fog / Mist             "Mist"   

   windSpeed windGust cardinalWindDirection  windDirection  humidity  \
0          6        -                    NW            315        97   
1          6        -                     N              0        97   
2          4        -                    NW            315        97   
3          2        -                     N              0        97   
4          2        -                     S            180        99   

   rainfall  pressure dayName       date reportTime  
0      0.01      1014  Monday 2024-11-18      00:00  
1      0.10      101

In [21]:
# Summarize the data
print(weather_df.info())       # General info about the dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   name                   10 non-null     object        
 1   temperature            10 non-null     int64         
 2   symbol                 10 non-null     object        
 3   weatherDescription     10 non-null     object        
 4   text                   10 non-null     object        
 5   windSpeed              10 non-null     int64         
 6   windGust               10 non-null     object        
 7   cardinalWindDirection  10 non-null     object        
 8   windDirection          10 non-null     int64         
 9   humidity               10 non-null     int64         
 10  rainfall               10 non-null     float64       
 11  pressure               10 non-null     int64         
 12  dayName                10 non-null     object        
 13  date    

In [23]:
# Summarize the data
print(weather_df.describe())   # Statistical summary of numerical columns

       temperature  windSpeed  windDirection   humidity   rainfall  \
count    10.000000  10.000000      10.000000  10.000000  10.000000   
mean      7.100000   5.900000     126.000000  96.700000   0.234000   
min       6.000000   2.000000       0.000000  95.000000   0.000000   
25%       7.000000   4.000000      90.000000  96.250000   0.010000   
50%       7.000000   6.000000      90.000000  97.000000   0.055000   
75%       7.000000   6.750000     157.500000  97.000000   0.550000   
max       8.000000  13.000000     315.000000  99.000000   0.700000   
std       0.567646   3.314949     111.848111   1.159502   0.323666   

          pressure                 date  
count    10.000000                   10  
mean   1013.100000  2024-11-18 00:00:00  
min    1011.000000  2024-11-18 00:00:00  
25%    1013.000000  2024-11-18 00:00:00  
50%    1013.000000  2024-11-18 00:00:00  
75%    1014.000000  2024-11-18 00:00:00  
max    1014.000000  2024-11-18 00:00:00  
std       0.994429               