## Processing of Scottish Hill Races data (using the Pandas library)

The source data is a `NumPy` array from the file `"arr_pandas.npy"` which contains data on the results of the Scottish Hill Races in 2000. A full description in English can be found on the [documentation page](https://vincentarelbundock.github.io/Rdatasets/doc/DAAG/races2000.html) for the source data file:

In [1]:
import pandas as pd
import numpy as np

In [2]:
arr = np.load("arr_pandas.npy", allow_pickle=True)
dat = pd.DataFrame(arr)
print('The array converted to a Pandas dataframe:')
dat

The array converted to a Pandas dataframe:


Unnamed: 0,0,1,2,3,4,5
0,Aonach Mor Gondola,2.0,2000,0.403611,0.518889,uphill
1,Broughton Brewery,2.0,650,0.254444,0.316667,other
2,El-Brim-Ick,3.0,750,0.485833,0.389167,other
3,The Devils Burdens,21.0,4100,2.399722,3.093333,relay
4,Tiso Carnethy,6.0,2500,0.782222,0.919167,hill
...,...,...,...,...,...,...
72,Tinto,4.5,1500,0.499444,0.581111,hill
73,Druim Fada,6.5,1000,0.751111,0.972222,other
74,Elrick,3.6,650,0.358056,0.4425,relay
75,Gondola,2.5,2000,0.387222,0.518889,uphill


In [3]:
print(f'There are {dat.shape[0]} lines and {dat.shape[1]} columns in our dataframe')

There are 77 lines and 6 columns in our dataframe


In [4]:
dat.columns = ['id', 'dist', 'climb', 'time', 'timef', 'type']
print('Dataframe with new column names:')
dat

Dataframe with new column names:


Unnamed: 0,id,dist,climb,time,timef,type
0,Aonach Mor Gondola,2.0,2000,0.403611,0.518889,uphill
1,Broughton Brewery,2.0,650,0.254444,0.316667,other
2,El-Brim-Ick,3.0,750,0.485833,0.389167,other
3,The Devils Burdens,21.0,4100,2.399722,3.093333,relay
4,Tiso Carnethy,6.0,2500,0.782222,0.919167,hill
...,...,...,...,...,...,...
72,Tinto,4.5,1500,0.499444,0.581111,hill
73,Druim Fada,6.5,1000,0.751111,0.972222,other
74,Elrick,3.6,650,0.358056,0.4425,relay
75,Gondola,2.5,2000,0.387222,0.518889,uphill


Explanation of column names:
* `id`: participant ID
* `dist`: distance, in miles (on the map)
* `climb`: total height gained during the route, in feet
* `time`: record time in hours
* `timef`: record time in hours for females
* `type`: type of race (*hill*, *marathon*, *relay*, *uphill* or *other*)

In [5]:
dat.index = dat['id']
name = "Norman's Law"
print(f'The height gained by participant {name} is {dat.loc[name, "climb"]} feet')

The height gained by participant Norman's Law is 700 feet


In [6]:
print('Values of distance, altitude and time results of the first 10 participants are:')
dat.loc[dat.iloc[:10, 0], ['dist', 'climb', 'time']]

Values of distance, altitude and time results of the first 10 participants are:


Unnamed: 0_level_0,dist,climb,time
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aonach Mor Gondola,2.0,2000,0.403611
Broughton Brewery,2.0,650,0.254444
El-Brim-Ick,3.0,750,0.485833
The Devils Burdens,21.0,4100,2.399722
Tiso Carnethy,6.0,2500,0.782222
Criffel,7.0,1800,0.793333
Chapelgill,1.5,1400,0.314444
Norman's Law,5.0,700,0.464167
Craig Dunain,6.0,900,0.546111
Knockfarrel,5.0,1200,0.623333


In [7]:
print('Summary of the dataframe:')
dat.info()

Summary of the dataframe:
<class 'pandas.core.frame.DataFrame'>
Index: 77 entries, Aonach Mor Gondola to Greenmantle
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      77 non-null     object
 1   dist    77 non-null     object
 2   climb   77 non-null     object
 3   time    77 non-null     object
 4   timef   75 non-null     object
 5   type    77 non-null     object
dtypes: object(6)
memory usage: 6.3+ KB


В коде нет ни одного столбца типа float

In [8]:
print('Results of relay participants:')
dat[dat['type'] == 'relay']

Results of relay participants:


Unnamed: 0_level_0,id,dist,climb,time,timef,type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
The Devils Burdens,The Devils Burdens,21.0,4100,2.399722,3.093333,relay
Ben Rha,Ben Rha,7.5,800,0.738056,0.997222,relay
Elrick,Elrick,3.6,650,0.358056,0.4425,relay


We are selecting the lines that correspond to the participants of the race in the hills (*hill*), which in total reached a height of more than 1000 feet. We are counting how many such participants there are.

In [9]:
height = 1000
hh = dat[(dat['type'] == 'hill') & (dat['climb'] > height)]
print(f'{hh.shape[0]} participants of the hill race reached a height of more than {height} feet')

39 participants of the hill race reached a height of more than 1000 feet


In [10]:
height1 = 4000
time1 = 0.5
print(f'The results of the participants who either reached a height of more than {height1} feet or spent less than {time1} hours:')
dat[(dat['climb'] > height1) | (dat['time'] < time1)]

The results of the participants who either reached a height of more than 4000 feet or spent less than 0.5 hours:


Unnamed: 0_level_0,id,dist,climb,time,timef,type
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aonach Mor Gondola,Aonach Mor Gondola,2.0,2000,0.403611,0.518889,uphill
Broughton Brewery,Broughton Brewery,2.0,650,0.254444,0.316667,other
El-Brim-Ick,El-Brim-Ick,3.0,750,0.485833,0.389167,other
The Devils Burdens,The Devils Burdens,21.0,4100,2.399722,3.093333,relay
Chapelgill,Chapelgill,1.5,1400,0.314444,0.376667,hill
Norman's Law,Norman's Law,5.0,700,0.464167,0.609167,hill
Screel,Screel,4.0,1300,0.458889,0.543611,hill
Hunters Bog,Hunters Bog,4.5,800,0.409444,0.477778,other
Aberfoyle Dash,Aberfoyle Dash,3.0,1000,0.3375,0.381389,uphill
Manx Mountain,Manx Mountain,30.5,8000,4.469722,5.523333,marathon


In [11]:
dat['time_min'] = dat['time'] * 60
print('The dataframe with a new \'time_min\' column containing the route time measured in minutes:')
dat

The dataframe with a new 'time_min' column containing the route time measured in minutes:


Unnamed: 0_level_0,id,dist,climb,time,timef,type,time_min
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Aonach Mor Gondola,Aonach Mor Gondola,2.0,2000,0.403611,0.518889,uphill,24.216667
Broughton Brewery,Broughton Brewery,2.0,650,0.254444,0.316667,other,15.266667
El-Brim-Ick,El-Brim-Ick,3.0,750,0.485833,0.389167,other,29.15
The Devils Burdens,The Devils Burdens,21.0,4100,2.399722,3.093333,relay,143.983333
Tiso Carnethy,Tiso Carnethy,6.0,2500,0.782222,0.919167,hill,46.933333
...,...,...,...,...,...,...,...
Tinto,Tinto,4.5,1500,0.499444,0.581111,hill,29.966667
Druim Fada,Druim Fada,6.5,1000,0.751111,0.972222,other,45.066667
Elrick,Elrick,3.6,650,0.358056,0.4425,relay,21.483333
Gondola,Gondola,2.5,2000,0.387222,0.518889,uphill,23.233333


In [12]:
dat['year'] = 2000
print('The dataframe with a new \'year\' column containing the year of the competition:')
dat

The dataframe with a new 'year' column containing the year of the competition:


Unnamed: 0_level_0,id,dist,climb,time,timef,type,time_min,year
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Aonach Mor Gondola,Aonach Mor Gondola,2.0,2000,0.403611,0.518889,uphill,24.216667,2000
Broughton Brewery,Broughton Brewery,2.0,650,0.254444,0.316667,other,15.266667,2000
El-Brim-Ick,El-Brim-Ick,3.0,750,0.485833,0.389167,other,29.15,2000
The Devils Burdens,The Devils Burdens,21.0,4100,2.399722,3.093333,relay,143.983333,2000
Tiso Carnethy,Tiso Carnethy,6.0,2500,0.782222,0.919167,hill,46.933333,2000
...,...,...,...,...,...,...,...,...
Tinto,Tinto,4.5,1500,0.499444,0.581111,hill,29.966667,2000
Druim Fada,Druim Fada,6.5,1000,0.751111,0.972222,other,45.066667,2000
Elrick,Elrick,3.6,650,0.358056,0.4425,relay,21.483333,2000
Gondola,Gondola,2.5,2000,0.387222,0.518889,uphill,23.233333,2000
