# Introduction to Data Processing in Python using Windy Birthdays!

<div class = "alert alert-info" role = "alert" 
     style = "font-size: 1.1em; padding: 15px; margin: 0px 0; text-align: left">

     In this tutorial, we will:
       1st ▷ download meteorological data from Rame Head NCI
       2nd ▷ use Python to help us assess wind and temperature measurements.

</div>

### `▶▶` [LINK](http://www.nci-ramehead.org.uk/weather/archive/) `◀◀` to the RH archive where you will download a 2-day txt file that includes your most recent birthday 

<div class="alert alert-info" role="alert"
     style="color:#000; font-size:1.1em; background-color:white; padding:10px; margin:1px; text-align:left;">
    
`How would we instruct a computer to assess wind and temperature measurements from the Rame Head weather measurements for us?` 

When you're still learning to code, it helps to first write out the steps you would need to analyse your data. Code - whether it is Python or R - is a type of instructive language that a computer "understands". In other words, coding is the process of giving your computer clear instructions so that it can do your work for you.
    
-----


#### ▶ Python, follow these steps:
1. Import the libraries needed for (i) data analysis (`numpy`, `pandas`) and (ii) plotting (`matplotlib`).
1. Load the Rame Head weather data into a neat table that is easy to work with, say a `dataframe` called weather_df.
1. Let me LOOK at the information inside the dataframe I just made:
    - was this import successful?
    - are my successfully imported data in an ideal format?
1. If no, help me to fix my problems, and make a new dataframe of the good data called bday_df.
1. Help me do some basic analysis, like find the `min` and `max` values of wind speed measured on my birthday.
1. Let me see variations through the day with a `line plot` of temperature and wind speeds from my birthday day.
1. Show me distributions of temperature and wind speeds from my birthday day with a `boxplot`.

</div>

### Step 1 ▶ ALWAYS first import the `libraries` you'll need -- basic setup

In [1]:
# Data Handling and Manipulation
import os
import numpy as np
import pandas as pd

# Data Visualisation and Plots
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Supress warnings (not errors)
import warnings
warnings.filterwarnings('ignore')

### Step 2 ▶ Import the data into a dataframe using `pandas` `read_csv` function

In [None]:
# First: - Give directions to your data (set the path) 
#        - Give your long filename a short name, file

path = "data/"
file = 'P2_TestData_Rame_Head_Weather.txt'
# David Attenborough's birthday: 8th May

# Then: Import your data from txt file into pandas dataframe, df
weather_df = pd.read_csv(os.path.join(path, file), delimiter = '\t') # txt: TAB delimited, tell Python to separate by '\t'


### Step 3 ▶ Use `print()` to look inside dataframe, and check success

In [None]:
# Inspect your df using 'print'
print(weather_df)

<div class="alert alert-info" role="alert" 
     style="font-size: 1.em; padding: 10px; margin: 0px 0; text-align: left;">

     Oh no... 
     That doesn't look good!
    
       - First, open your birthday text file using Windows File Explorer (or Finder if MacBook). 
       - You'll see that Rame Head have tried to make the data they provide look nice by splitting header names over 
         two lines and adding a line of dashes before the numerical data starts.
       - The columns also don't look like they're consistently tab spaced - done neatly for looks rather than regularity.
       ★ Unfortunately, Python is going to struggle with this ★
      
     There are a few ways to fix things. 
       - One neat option is to selectively import the columns we want by number (count). 
       - REMEMBER: Python always starts counting from zero, not one.
       - Column 0: 'Date'; Column 1: 'Time'; Column 2: 'Temp Out'; Column 7: 'Wind Speed'
       

</div>

### Step 2 and 3 (again) ▶ Import the data into a dataframe and look inside

In [None]:
# Column numbers (count from 0)
columns_to_read = [0, 1, 2, 7]   

# Read in data from txt file to make a weather_df
weather_df = pd.read_csv(os.path.join(path, file),         # - path to, and name of, your data file.
                         delim_whitespace = True,          # - tells pythong that columns are separated by whitespace, not tab/comma.
                         skiprows= 3,                      # - skips the first 3 rows which contain text and the separator line.
                         usecols = columns_to_read,        # - tells python to only import the columns we pre-defined (0, 1, 2, 7).
                         header  = None)                   # - prevents python from treating the first row as a header (column name).

# Assign headers to columns
weather_df.columns = ['Date', 'Time', 'Temp', 'Wind']

# Display results
print(weather_df)

<div class="alert alert-info" role="alert" 
     style="font-size: 1.1em; padding: 10px; margin: 0px 0; text-align: center">
    
    ✨ SUCCESS ✨
</div>


### Step 4 ▶ Data Handling and Manipulation

In [None]:
# How many unique days are in our file?
num_days = weather_df['Date'].nunique()

# Print the result
print(f"There are {num_days} days of data in the file, but we only want one - the birthday date.")

<div class = "alert alert-info" role = "alert" 
     style = "color:#000; font-size: 1.1em; padding: 10px; margin: 0px 0; text-align: left">


**☆ Your Input Needed Here: ☆**
    
Replace '`date_from_df`' with a specified date from your dataframe (df) -- direct `copy` and `paste` is recommended.
   
</div>

In [None]:
birthday = '8/05/25'

# Can you guess the Data Type held by variable 'birthday'?
print(type(birthday))


In [None]:
# Filter 2-day weather_df for rows where 'Date' is 'birthday'
# to make the new exclusive birthday only dataframe, bday_df

bday_df = weather_df[weather_df['Date'] == birthday]

# Show bday_df
print(bday_df)

<div class="alert alert-info" role="alert" 
     style="font-size: 1.em; padding: 10px; margin: 0px 0; text-align: left">
    
     uhm, what just happened here? 
     Oh this? You just used Boolean masking, nbd.

       - weather_df['Date'] == birthday makes a Boolean Series (revisit P0: Data Types)
         • 'True' for all rows where Date matches the value stored in variable birthday.
         • 'False' for all the rows where Date is NOT an exact match.
         
       - Your new dataframe, bday_df, contains only the rows from weather_df where the Date column exactly matches birthday.
       

</div>


### Step 5 ▶ Basic Data Analysis including use of `pandas` `datetime` function to fix Dates

In [None]:
# Combine Date and Time into a new datetime (DTime) column
bday_df['DTime'] = pd.to_datetime(bday_df['Date'] + ' ' + bday_df['Time'], 
                                 format = '%d/%m/%y %H:%M')

In [None]:
# What were the min, max wind speeds on your birthday?
min_wind = bday_df['Wind'].min()
max_wind = bday_df['Wind'].max()

# Find the index of min, max wind speeds
min_wind_ind = bday_df['Wind'].idxmin()
max_wind_ind = bday_df['Wind'].idxmax()

# Find the corresponding Date-Time (DTime)
time_min = bday_df['DTime'][min_wind_ind]
time_max = bday_df['DTime'][max_wind_ind]

# Output min, max wind speeds with print
print("On my birthday:")
print(f' - the min wind speed of {min_wind:.2f} m/s was measured at {time_min.strftime("%H:%M")}')
print(f' - the max wind speed of {max_wind:.1f} m/s was measured at {time_max.strftime("%H:%M")}')

---
### Step 5: `Line plot` of temperature and wind from your birthday day

<div class = "alert alert-info" role = "alert" 
     style = "color:#000; font-size: 1.1em; padding: 10px; margin: 0px 0; text-align: left">


**☆ Your Input Needed Here: ☆**
    
Replace the `ylim (min, max)` with axis limits that show off your data best.
   
</div>

In [None]:
# Create fig with axes, size 13 x 4 (l, h)
fig1, ax = plt.subplots(figsize = (15, 5))

# Plot temp data using 's' square markers in red
plt.plot(bday_df['DTime'], bday_df['Temp'], linestyle = ':', marker = 's', markersize = 1, color = 'red', 
         label = 'Temperature (°C )')
# Plot wind speed using 'o' round marker in blue
plt.plot(bday_df['DTime'], bday_df['Wind'], linestyle = ':', marker = 'o', markersize = 2, color = 'blue',
         label = 'Wind Speed (m/s)')

# Format x-axis:
# 1. Set x-ticks for every 1 hour using mdates hour locator
ax.xaxis.set_major_locator(mdates.HourLocator(interval = 1))

# 2. Set x-ticks to only show HH:MM (hour:min) with no date
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M' ))

# 3. Angle x-ticks labels, rotation=n degrees
plt.setp(ax.get_xticklabels(), rotation = 30)

# 4. Add bold x-axis label
ax.set_xlabel('Time', fontweight = 'bold')

# Format y-axes:
# Set min, max limits
ax.set_ylim(-30 , 30)

# Format figure
# Add gridlines
ax.grid(True, color = 'lightgrey', 
        linestyle = ':', linewidth = 0.5)

# Add a header
ax.set_title(f"My Birthday Weather: {birthday}", fontsize = 11, weight = 'bold')

# Show legends
ax.legend()

# Show the plot
plt.show()

In [None]:
# Save your figure by uncommenting the next line of code
#fig1.savefig('Birthday_LinePlot.png', dpi = 300, bbox_inches = 'tight')

### Step 6: `Boxplot` of temperature and wind from your birthday day

In [None]:
# Create your figure (fig2) and axis (ax)
fig2, ax = plt.subplots(figsize = (6, 4))

# Plot data with labels
plt.boxplot([bday_df['Temp'], bday_df['Wind']], 
            labels = ['Temperature', 'Wind Speed'])

# Add gridlines
ax.grid(True, linestyle = ':', linewidth = 0.5, color = 'grey')

# Add a figure title
ax.set_title(f"Birthday Weather in Plymouth: {birthday}", 
             fontsize= 11, fontweight = 'bold')

# Show the plot
plt.show()

In [None]:
# Save your figure by uncommenting the next line of code
#fig2.savefig('Birthday_Boxplots.png', dpi = 300, bbox_inches = 'tight')

<div class="alert alert-info" role="alert" 
     style="font-size: 1.2em; padding: 10px; margin: 10px 0; text-align: center;">
    
    Well done on successfully using python to plot and assess weather data from 
    ✨your special day✨
</div>

<div class="alert alert-info" role="alert"
     style="color:#000; font-size:1.1em; padding:10px; margin:1px; text-align:left;">
    
### `▢ Did you know? ▢`

 • The Rame Head station was opened in May 1998 and is part of the National Coastwatch Institution (NCI). 
 
 • NCI Rame Head is one of 50 NCI stations operating around the British Isles.

 `▢` Learn more here:
http://www.nci-ramehead.org.uk/weather/History_Vantage_Pro.htm

</div>