## Rugby League Records 
This notebook is for exploring the the data from https://rugbyleaguerecords.com

The data was scrapped and placed into rugbyleaguerecords.csv

The steps below are the same as the cleaning data page can you make it work for  rugbyleaguerecords.csv

# Be able to investigate client requirements for data analysis
# 2.4 Quantitative data analysis

* mean
* median
* standard deviation
* range

## Set up

In [1]:
try:
    import micropip
    await micropip.install(["pyoliteutils", "textblob"])
except:
    pass

In [2]:
from pyoliteutils import *
import pandas as pd

## Other useful Opponent / Match Data?

### Location of opponents, and distance to Sheffield

To be able to put data on a map we will need Latitude and Lngitude,

#### Possible Data Sources : 
- https://api.postcodes.io/places?q=[query]
- Nominatum / GeoPy
    - https://geopy.readthedocs.io/en/stable/
    - https://nominatim.org/release-docs/latest/api/Search/ 
    - https://medium.com/@gopesh3652/geocoding-with-python-using-nominatim-a-beginners-guide-220b250ca48d 


## Stadium Data

In [3]:
stadiums = pd.read_csv("../data/eagles/stadiums.csv")
stadiums

Unnamed: 0,Stadium,Postcode
0,OWLERTON,S6 2DE
1,MILLMOOR,S60 1HB
2,HEADINGLEY,LS6 3BR
3,BRAMALL LANE,S2 4QX
4,HILLSBOROUGH,S6 1SW
5,OAKWELL,S71 1ET
6,BELLE VUE,DN4 5DX
7,SALTERGATE,S40 4SX
8,TATTERSFIELD,DN4 5JW
9,THRUM HALL,HX1 4LG


## Adding the Latitude & Longitude

In [4]:
import postcodes_io_api

class Api2(postcodes_io_api.Api): # Extends the postcodes_io_api to include the place query available online
        def places_query(self, place):
            """
            This method returns data for a place
            * **:param place** - postcode to check i.e. 'Sheffield'
            * **::return** - detailed data
            ```
              data = api.places_query('Sheffield')
        
            ```
            """
            url = '/places?q={place}'.format(place=place)
            response = self._make_request('GET', url)
            data = self._parse_json_data(response.content.decode('utf-8'))
            return data

api  = Api2()

def get_place(place):
    latitude = None
    longitude = None
    
    data = api.places_query(place)
    #print("data", data)
    
    if data["status"] == 200 and len(data["result"]):   
        # Gets the data from the first item in the returned list
        if "latitude" in data["result"][0]:
            latitude = data["result"][0]["latitude"]
            longitude = data["result"][0]["longitude"]
    #print("latitude, longitude", latitude, longitude)
    return latitude, longitude

def get_latlong(postcode):
    latitude = None
    longitude = None

    # Look up postcode
    data = api.get_postcode(postcode)
    #print("data", data)
    if data["status"] != 200 :
        # if postcode look up fails try looking it upo as just the first bit (outcode)
        data = api.get_outcode(postcode)
    
    if data["status"] == 200 :        
        if "latitude" in data["result"]:
            latitude = data["result"]["latitude"]
            longitude = data["result"]["longitude"]
    #print("latitude, longitude", latitude, longitude)
    return latitude, longitude
    
def get_latlongs(df):
    if ("Latitude" not in df) and ("Postcode" in df):    
      try:
        df[["Latitude", "Longitude"]] = df.apply(
            lambda row: get_latlong(row["Postcode"]), axis=1, result_type="expand"
        )
      except Exception as e:
          print('Postcode Conversion failed : '+ str(e))
    return df

def get_places(df, field_name):
    if ("Latitude" not in df) and (field_name in df):    
      try:
        df[[field_name + " Latitude", field_name + " Longitude"]] = df.apply(
            lambda row: get_place(row[field_name]), axis=1, result_type="expand"
        )
      except Exception as e:
          print('Place  Conversion failed : '+ str(e))
    return df

In [5]:
stadiums = get_latlongs(stadiums)
stadiums

Unnamed: 0,Stadium,Postcode,Latitude,Longitude
0,OWLERTON,S6 2DE,53.406031,-1.493303
1,MILLMOOR,S60 1HB,53.428758,-1.369397
2,HEADINGLEY,LS6 3BR,53.816081,-1.580617
3,BRAMALL LANE,S2 4QX,53.371341,-1.469862
4,HILLSBOROUGH,S6 1SW,53.410844,-1.500859
5,OAKWELL,S71 1ET,53.552266,-1.468631
6,BELLE VUE,DN4 5DX,53.517626,-1.10875
7,SALTERGATE,S40 4SX,53.238963,-1.434745
8,TATTERSFIELD,DN4 5JW,53.509088,-1.113845
9,THRUM HALL,HX1 4LG,53.721864,-1.884001


## Calculating Distance from Sheffield

In [6]:
#https://towardsdatascience.com/calculating-distance-between-two-geolocations-in-python-26ad3afe287b

sheffield_latlong = get_place("Sheffield")
import haversine as hs

#stadiums['dist_from_sheffield'] = hs.haversine(sheffield_latlong, (stadiums["Latitude"], stadiums["Longitude"]), unit=Unit.MILES)

stadiums['Miles from Sheffield'] = stadiums.apply(
    lambda row: hs.haversine(sheffield_latlong, (row["Latitude"], row["Longitude"]), unit=hs.Unit.MILES), axis=1, result_type="expand"
)
stadiums

Unnamed: 0,Stadium,Postcode,Latitude,Longitude,Miles from Sheffield
0,OWLERTON,S6 2DE,53.406031,-1.493303,1.918934
1,MILLMOOR,S60 1HB,53.428758,-1.369397,5.185172
2,HEADINGLEY,LS6 3BR,53.816081,-1.580617,30.307328
3,BRAMALL LANE,S2 4QX,53.371341,-1.469862,0.773637
4,HILLSBOROUGH,S6 1SW,53.410844,-1.500859,2.367659
5,OAKWELL,S71 1ET,53.552266,-1.468631,11.729022
6,BELLE VUE,DN4 5DX,53.517626,-1.10875,17.50178
7,SALTERGATE,S40 4SX,53.238963,-1.434745,10.015773
8,TATTERSFIELD,DN4 5JW,53.509088,-1.113845,17.01545
9,THRUM HALL,HX1 4LG,53.721864,-1.884001,28.993035


In [7]:
#Save for later 
stadiums.to_csv("../data/eagles/stadiums_with_latlong.csv", index=False)

## Crowd / Attendance Data

In [8]:
crowds = pd.read_csv("../data/eagles/rugbyleaguerecords.csv")
## https://datascienceparichay.com/article/pandas-extract-year-from-datetime-column/
crowds

Unnamed: 0,Season,Match Date,Home Score,Home Team,Away Score,Away Team,Competition,Attendance,Venue,Referee
0,1984-85,"Sunday, 2nd September 1984",29,Sheffield Eagles,10,Rochdale Hornets,League 2,1425,"Owlerton Stadium, Sheffield, England",Fred Lindop
1,1984-85,"Sunday, 9th September 1984",14,Sheffield Eagles,18,Fulham,League 2,1145,"Owlerton Stadium, Sheffield, England",Jeff Croft
2,1984-85,"Thursday, 20th September 1984",13,Runcorn Highfield,6,Sheffield Eagles,League 2,,,
3,1984-85,"Sunday, 23rd September 1984",6,Sheffield Eagles,13,Salford,League 2,1159,"Owlerton Stadium, Sheffield, England",John McDonald
4,1984-85,"Sunday, 30th September 1984",18,Doncaster,10,Sheffield Eagles,League 2,,,
...,...,...,...,...,...,...,...,...,...,...
1231,2024,"Sunday, 25th February 2024",16,York Knights,32,Sheffield Eagles,Challenge Cup 4 (64),LNER Community Stadium York England,Referee: Liam Rush,
1232,2024,"Sunday, 3rd March 2024",26,Sheffield Eagles,10,Batley Bulldogs,1895 Cup Quarter Final,727,"Sheffield Olympic Legacy Park, Sheffield, England",Marcus Griffiths
1233,2024,"Saturday, 9th March 2024",12,Swinton Lions,14,Sheffield Eagles,Challenge Cup 5 (32),412,"Heywood Road, Sale, England",Marcus Griffiths
1234,2024,"Friday, 15th March 2024",24,Sheffield Eagles,22,Toulouse Olympique,League 2,794,"Sheffield Olympic Legacy Park, Sheffield, England",Michael Smaill


In [9]:
#home games

In [10]:
crowds = crowds[crowds["Home Team"] == "Sheffield Eagles"]
crowds

Unnamed: 0,Season,Match Date,Home Score,Home Team,Away Score,Away Team,Competition,Attendance,Venue,Referee
0,1984-85,"Sunday, 2nd September 1984",29,Sheffield Eagles,10,Rochdale Hornets,League 2,1425,"Owlerton Stadium, Sheffield, England",Fred Lindop
1,1984-85,"Sunday, 9th September 1984",14,Sheffield Eagles,18,Fulham,League 2,1145,"Owlerton Stadium, Sheffield, England",Jeff Croft
3,1984-85,"Sunday, 23rd September 1984",6,Sheffield Eagles,13,Salford,League 2,1159,"Owlerton Stadium, Sheffield, England",John McDonald
6,1984-85,"Sunday, 14th October 1984",20,Sheffield Eagles,14,Runcorn Highfield,League 2,1076,"Owlerton Stadium, Sheffield, England",Paul Volante
8,1984-85,"Sunday, 28th October 1984",26,Sheffield Eagles,10,Bridgend,League 2,826,"Owlerton Stadium, Sheffield, England",Cliff Hodgson
...,...,...,...,...,...,...,...,...,...,...
1226,2023,"Sunday, 24th September 2023",16,Sheffield Eagles,17,Bradford Bulls,League 2,1976,"Sheffield Olympic Legacy Park, Sheffield, England",Marcus Griffiths
1227,2023,"Sunday, 1st October 2023",0,Sheffield Eagles,42,London Broncos,Play-offs League 2 EPO,648,"Sheffield Olympic Legacy Park, Sheffield, England",James Vella
1229,2024,"Saturday, 10th February 2024",88,Sheffield Eagles,12,Newcastle Thunder,Challenge Cup 3 (128),617,"Sheffield Olympic Legacy Park, Sheffield, England",Denton Arnold
1232,2024,"Sunday, 3rd March 2024",26,Sheffield Eagles,10,Batley Bulldogs,1895 Cup Quarter Final,727,"Sheffield Olympic Legacy Park, Sheffield, England",Marcus Griffiths


In [15]:
crowds["Venue"].unique()

array(['Owlerton Stadium, Sheffield, England',
       'Millmoor, Rotherham, England', 'Headingley, Leeds, England',
       'Bramall Lane, Sheffield, England',
       'Hillsborough, Sheffield, England',
       'Saltergate, Chesterfield, England',
       'Thrum Hall, Halifax, England', 'Belle Vue, Wakefield, England',
       'Tattersfield, Doncaster, England', 'Oakwell, Barnsley, England',
       'Don Valley Stadium, Sheffield, England', nan,
       'Cardiff Arms Park, Cardiff, Wales',
       'Neutral Venue: Headingley, Leeds, England',
       'Neutral Venue: Wembley Stadium, London, England',
       'KC Stadium, Hull, England', 'Clifton Lane, Rotherham, England',
       'Woodbourn Athletic Stadium, Sheffield, England',
       'Referee: Craig Halloran', 'Mount St Marys, Spinkhill, England',
       'Sheffield Hallam University Sports Ground, Sheffield, England',
       'Keepmoat Stadium, Doncaster, England',
       'Castle Park, Doncaster, England',
       'Neutral Venue: Bloomfield Road,

In [13]:
olp_latlong = get_place("Sheffield Olympic Legacy Park, Sheffield, England")
olp_latlong

(None, None)

## Get town names from Opponent fields

In [12]:
def team_location(opponent):
    location = opponent.replace("St Pats", "")
    location = location.replace("R-", "")
    location = location.replace("KR", "")
    location = location.replace("East", "")
    location = location.replace("Met", "")
    location = location.replace("Crusaders", "Wrexham") # or Bridgeend
    location = location.replace("Skolars", "London ")
    location = location.strip(" ")
    return location
    
crowds['Town'] = crowds.apply(
    lambda row: team_location(row["Opponents"]), axis=1, result_type="expand"
)
crowds

KeyError: 'Opponents'

## Geolocate the Towns

In [None]:
crowds = get_places(crowds, "Town")
crowds

In [None]:
# filter for na/s and location for teams
crowds[crowds['Town Latitude'].isnull()]



In [16]:
from geopy.geocoders import Nominatim
from pprint import pprint

# Instantiate a new Nominatim client
app = Nominatim(user_agent="UTC OLP")

# Get location raw data from the user
your_loc = "Sheffield Olympic Legacy Park, Sheffield, England"
your_loc = "Bradford"
location = app.geocode(your_loc).raw

# Print raw data
pprint(location)

{'addresstype': 'city',
 'boundingbox': ['53.7243414', '53.9631510', '-2.0612483', '-1.6403301'],
 'class': 'boundary',
 'display_name': 'Bradford, West Yorkshire, England, United Kingdom',
 'importance': 0.5765366261116545,
 'lat': '53.7944229',
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. '
            'http://osm.org/copyright',
 'lon': '-1.7519186',
 'name': 'Bradford',
 'osm_id': 118323,
 'osm_type': 'relation',
 'place_id': 241965145,
 'place_rank': 16,
 'type': 'administrative'}


### Drop the towns outside the UK

In [None]:
#crowds = crowds[crowds['Town Latitude'].isnull() == False]
crowds = crowds.dropna().reset_index(drop=True)
crowds

## Add the distance to Sheffield

In [None]:
crowds['Opponents Distance'] = crowds.apply(
    lambda row: hs.haversine(sheffield_latlong, (row["Town Latitude"], row["Town Longitude"]), unit=hs.Unit.MILES), axis=1, result_type="expand"
)
crowds


## Get Usable Date Information

In [None]:
crowds['Date'] = pd.to_datetime(crowds['Date'], format="%d/%m/%Y")
crowds['Year'] = crowds['Date'].dt.year
crowds['Day'] = crowds['Date'].dt.day_name()
crowds['Month'] = crowds['Date'].dt.month
crowds['Month Name'] = crowds['Date'].dt.month_name()
crowds

## Saving for later

In [None]:
crowds.to_csv("../data/eagles/crowds_clean.csv")

### Weather in Sheffield

Could do daily weather at the correct stadium location but monthly for Sheffield's Weather station will be easier at first

#### Possible Data Sources : 

- https://www.metoffice.gov.uk/research/climate/maps-and-data/historic-station-data saved as [text](../data/eagles/sheffield_montly_weather.csv)
- https://api.openweathermap.org/data/3.0/onecall/timemachine?lat=39.099724&lon=-94.578331&dt=1643803200&appid=b112bddb3ca3876644b7c695768ae96d

In [None]:
sheffield_monthly_weather = pd.read_csv("../data/eagles/sheffield_monthly_weather.csv")
sheffield_monthly_weather

In [None]:
sheffield_monthly_weather.rename(columns={
    'yyyy' : "Year",
    'mm' : "Month",
    'tmax degC' : "Max Temp C",
    'tmin degC' : "Min Temp C",
    'rain mm' : "Rain mm",
},inplace=True)

sheffield_monthly_weather.drop(columns=["af days", "sun hours"], inplace=True)

sheffield_monthly_weather

In [None]:
#result = pd.concat([crowds, sheffield_monthly_weather], axis=1, join="inner")
crowds_with_monthly_weather = pd.merge(crowds, sheffield_monthly_weather, how="left", on=["Year", "Month"])
crowds_with_monthly_weather.to_csv("../data/eagles/crowds_with_monthly_weather.csv", index=False)
crowds_with_monthly_weather

In [None]:
 crowds_with_monthly_weather[crowds_with_monthly_weather.isna().any(axis=1)]

## Survey Data

In [None]:
questionnaire = pd.read_csv("../data/eagles/questionnaire.csv")
questionnaire

In [None]:

questionnaire.rename(columns={
    'Could you provide your postcode? This will help us understand where people are responding from. ':"Postcode",
},inplace=True)

questionnaire


In [None]:


questionnaire_quantitive = questionnaire[[
    questionnaire.columns[1],
    questionnaire.columns[2],
    questionnaire.columns[3],
    questionnaire.columns[14],
    questionnaire.columns[15],
    questionnaire.columns[16],
    questionnaire.columns[17]
]]
questionnaire_quantitive

In [None]:
questionnaire_quantitive = get_latlongs(questionnaire_quantitive)
questionnaire_quantitive

In [None]:
questionnaire_quantitive.to_csv("../data/eagles/questionnaire_quantitive.csv", index=False)