# <img src="./resources/GA.png" width="25" height="25" /> <span style="color:Blue">DSI Capstone:  MTB Trail Recommender Engine</span> 
---
## <span style="color:Green">Preprocessing</span>      

#### Ryan McDonald -General Assembly 

---

### Notebook Contents:

- [Content- Based Recommender Prep](#intro)    
    - [Arizona Content Recommender](#recaz)
    - [Utah Content Recommender](#recut) 
- [User- Based Binary Recommender Prep](#user_rec)
    - [Arizona User Recommender](#azuser)
    - [Utah User Recommender](#utuser) 

**Imports**

In [1]:
# basic imports
import numpy as np
import pandas as pd
import sys

# general processing, CSV manipulation
from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances, cosine_similarity
from sklearn.preprocessing import MinMaxScaler

# # Spatial distance module
# import geopandas as gpd
# from shapely.geometry import Point
# from shapely.ops import nearest_points

<a id='intro'></a>
## 1. Content - Based Recommender

Now that all our data is cleaned and formatted appropriately, there are just a few preprocessing steps needed to create a reliable recommender system.  We'll start with a content-based recommender, utilizing cleaned trail statistics data, in order to show users the top ten most similar trails based on a trail of the users choosing.  The streamlit app will allow a user to search for a 'starter' trail by filtering through charateristics they enjoy most.  That trail will then be inputed into the streamlit-based content recommender to display the top 10 trails.  The user can then investigate those trails and get a great mountain bike ride planned!

## Read Data- 

### Arizona Trail Data

In [2]:
# reading in the scaled, one_hot_encoded dataset for the recommender system
az_trails = pd.read_csv('./data/recommender_data/az_trail_data.csv')
az_trails = az_trails.set_index('trail_name')
az_trails.head()

Unnamed: 0_level_0,length,longitude,latitude,popularity,rating,tot_climb,tot_descent,ave_grade,max_grade,max_elevation,...,difficulty_intermediate,difficulty_intermediate/difficult,difficulty_very difficult,dog_policy_leashed,dog_policy_no dogs,dog_policy_off-leash,dog_policy_unknown,e_bike_policy_allowed,e_bike_policy_not allowed,e_bike_policy_unknown
trail_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hiline Trail,0.022399,0.619678,0.507727,1.0,0.94,0.022963,0.057739,0.315789,0.357143,0.429345,...,0,0,1,0,0,0,1,0,0,1
Slim Shady Trail,0.018786,0.6171,0.508725,0.998953,0.88,0.018666,0.021932,0.210526,0.112245,0.412061,...,0,1,0,0,0,0,1,0,0,1
Mescal,0.017341,0.637923,0.498336,0.997906,0.92,0.01451,0.013791,0.157895,0.112245,0.435423,...,0,1,0,0,0,0,1,0,0,1
Chuckwagon,0.039017,0.637893,0.498445,0.996859,0.9,0.039375,0.040625,0.210526,0.132653,0.432479,...,1,0,0,0,0,0,1,0,0,1
Tortolita Preserve Loop,0.070087,0.197366,0.626201,0.995812,0.84,0.036627,0.0432,0.105263,0.040816,0.254416,...,1,0,0,0,0,0,1,0,0,1


In [3]:
az_trails.shape, az_trails.isnull().sum().sort_values(ascending = False).head()

((956, 24),
 longitude                    11
 latitude                     11
 e_bike_policy_unknown         0
 e_bike_policy_not allowed     0
 popularity                    0
 dtype: int64)

### Utah Trail Data

In [4]:
# reading in the scaled, one_hot_encoded dataset for the recommender system
ut_trails = pd.read_csv('./data/recommender_data/ut_trail_data.csv')
ut_trails = ut_trails.set_index('trail_name')
ut_trails.head()

Unnamed: 0_level_0,length,longitude,latitude,popularity,rating,tot_climb,tot_descent,ave_grade,max_grade,max_elevation,...,difficulty_intermediate,difficulty_intermediate/difficult,difficulty_very difficult,dog_policy_leashed,dog_policy_no dogs,dog_policy_off-leash,dog_policy_unknown,e_bike_policy_allowed,e_bike_policy_not allowed,e_bike_policy_unknown
trail_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Thunder Mountain Trail #33098,0.065165,0.140063,0.312952,1.0,0.94,0.052217,0.14821,0.3,0.409091,0.632152,...,0,1,0,0,0,1,0,0,0,1
Wasatch Crest,0.100563,0.726322,0.462306,0.998922,0.96,0.082152,0.234174,0.3,0.393939,0.817796,...,0,1,0,0,1,0,0,0,1,0
Captain Ahab,0.033789,0.304305,0.87393,0.997845,0.94,0.024706,0.086493,0.3,0.348485,0.246302,...,0,0,0,1,0,0,0,0,1,0
Wire Mesa Loop,0.059533,0.025333,0.145997,0.996767,0.92,0.032437,0.03659,0.1,0.181818,0.200894,...,0,1,0,0,0,0,1,1,0,0
Ramblin',0.026549,0.328357,0.838821,0.99569,0.94,0.014778,0.035091,0.15,0.181818,0.28999,...,0,1,0,1,0,0,0,0,1,0


#### Creating a Content- Based Recommender

In [5]:
def content_recommend(df):
    
    # creating the sparse matrix
    sparse_matrix = sparse.csr_matrix(df.fillna(0))
       
    # calculating pairwise distances and building into a dataframe
    rec = pairwise_distances(sparse_matrix, metric = 'cosine')
    
    # saving pairwise matrix as a dataframe
    rec = pd.DataFrame(rec, index = df.index, columns = df.index)
    
    # return the dataframe
    return rec

### Arizona Trails Pairwise_Distance DF

In [6]:
az_rec = content_recommend(az_trails)
az_rec

trail_name,Hiline Trail,Slim Shady Trail,Mescal,Chuckwagon,Tortolita Preserve Loop,Lone Cactus Loop,Apache Wash Loop,Desperado Loop,North Loop,Bug Springs,...,Monument Trail,Spine Trail,Spine Trail to Ridge Trail Connector,Far West Trail,Alamo Springs Spur Trail,Trail C,Trail G,Trail H,Trail D,Kain Trail
trail_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Hiline Trail,0.000000,0.174012,0.173635,0.171861,0.210020,0.209322,0.385569,0.382047,0.373010,0.185404,...,0.619941,0.422191,0.411123,0.432790,0.404184,0.410994,0.400494,0.406358,0.408603,0.411950
Slim Shady Trail,0.174012,0.000000,0.000515,0.170912,0.204185,0.205499,0.382805,0.382360,0.374641,0.199690,...,0.615020,0.416384,0.417889,0.420689,0.421725,0.408406,0.406637,0.185855,0.407863,0.414595
Mescal,0.173635,0.000515,0.000000,0.169433,0.205173,0.206346,0.381109,0.381116,0.373362,0.201811,...,0.616264,0.422664,0.427050,0.424741,0.432608,0.413944,0.413776,0.194056,0.413974,0.422610
Chuckwagon,0.171861,0.170912,0.169433,0.000000,0.026550,0.029780,0.382711,0.382585,0.373995,0.198061,...,0.614656,0.419340,0.204844,0.423613,0.225401,0.189014,0.407759,0.409140,0.189339,0.416799
Tortolita Preserve Loop,0.210020,0.204185,0.205173,0.026550,0.000000,0.001065,0.395093,0.375853,0.370878,0.209881,...,0.645532,0.416363,0.197769,0.409590,0.230118,0.192250,0.430785,0.428032,0.194174,0.414963
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Trail C,0.410994,0.408406,0.413944,0.189014,0.192250,0.197469,0.656729,0.668259,0.660269,0.425583,...,0.576559,0.296292,0.015321,0.311230,0.060976,0.000000,0.288635,0.288285,0.000494,0.303732
Trail G,0.400494,0.406637,0.413776,0.407759,0.430785,0.432210,0.426115,0.440730,0.434386,0.407729,...,0.576701,0.012339,0.282841,0.319346,0.289132,0.288635,0.000000,0.283847,0.285194,0.292341
Trail H,0.406358,0.185855,0.194056,0.409140,0.428032,0.430074,0.657826,0.667859,0.658605,0.418125,...,0.576370,0.295993,0.290961,0.313520,0.306951,0.288285,0.283847,0.000000,0.286580,0.298793
Trail D,0.408603,0.407863,0.413974,0.189339,0.194174,0.199227,0.657807,0.668152,0.659034,0.420332,...,0.575970,0.295465,0.011559,0.312430,0.052484,0.000494,0.285194,0.286580,0.000000,0.299151


### Utah Trails Pairwise_Distance DF

In [7]:
ut_rec = content_recommend(ut_trails)
ut_rec

trail_name,Thunder Mountain Trail #33098,Wasatch Crest,Captain Ahab,Wire Mesa Loop,Ramblin',Rush,Bull Run,Big Mesa,Getaway,Dino-Flow,...,Jones Ranch Trail #123 Alternate Access,Sovereign Connect,Whales Connect,Humpback,Flat Iron Mesa 4x4 Jeep Road Spur,BST Access Trail,The Farm - Green Trail,Hi Line,Carin-Age,Lasso
trail_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Thunder Mountain Trail #33098,0.000000,0.337835,0.551475,0.402220,0.385433,0.373467,0.375111,0.548581,0.553409,0.405215,...,0.586680,0.662385,0.650733,0.647736,0.826635,0.653027,0.704335,0.623006,0.618705,0.842940
Wasatch Crest,0.337835,0.000000,0.372677,0.429697,0.213528,0.178302,0.204471,0.364152,0.364672,0.543282,...,0.694661,0.804063,0.842944,0.840320,0.771875,0.747212,0.624618,0.777175,0.773478,0.592971
Captain Ahab,0.551475,0.372677,0.000000,0.603297,0.172981,0.531512,0.170624,0.172176,0.177188,0.350722,...,0.809777,0.793656,0.918735,0.915005,0.768928,0.828743,0.676294,0.853385,0.843765,0.594059
Wire Mesa Loop,0.402220,0.429697,0.603297,0.000000,0.418805,0.237567,0.418482,0.602992,0.610502,0.620838,...,0.701861,0.718164,0.705162,0.704265,0.490050,0.723431,0.368349,0.701449,0.699957,0.712317
Ramblin',0.385433,0.213528,0.172981,0.418805,0.000000,0.369284,0.001984,0.169689,0.172197,0.347940,...,0.801376,0.792799,0.909435,0.908101,0.776620,0.829494,0.678424,0.848264,0.846979,0.585471
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
BST Access Trail,0.653027,0.747212,0.828743,0.723431,0.829494,0.779716,0.822303,0.824173,0.825768,0.620860,...,0.042294,0.313384,0.348712,0.078478,0.555744,0.000000,0.544516,0.038585,0.036273,0.312054
The Farm - Green Trail,0.704335,0.624618,0.676294,0.368349,0.678424,0.456779,0.677014,0.494399,0.504036,0.507482,...,0.567359,0.382348,0.418159,0.659477,0.166359,0.544516,0.000000,0.604320,0.599345,0.605575
Hi Line,0.623006,0.777175,0.853385,0.701449,0.848264,0.835228,0.840080,0.842774,0.839346,0.633630,...,0.016246,0.307568,0.300528,0.013694,0.551729,0.038585,0.604320,0.000000,0.003339,0.292346
Carin-Age,0.618705,0.773478,0.843765,0.699957,0.846979,0.827146,0.836662,0.840916,0.838316,0.633379,...,0.020773,0.309848,0.306956,0.018523,0.547404,0.036273,0.599345,0.003339,0.000000,0.298653


<a id='recaz'></a>
### Arizona Trail Content Recommender

Trails with highest similarity between eachother represent lower values (with **'0'** being equal to itself, **'1'** being not similar at all)

In [8]:
# Which 10 trails are most similar to Hangover Trail?
# This field is a user input within the streamlit app!

az_rec['Hiline Trail'].sort_values().head(11)[1:]

trail_name
Hangover Trail                    0.000617
Kellog/Incinerator Ridge          0.042147
Tabletop                          0.042998
Western Loop Trail                0.051682
Green Mountain                    0.070407
Baby Jesus Trail                  0.087600
Hog Heaven                        0.166381
Cathedral Rock Connector Trail    0.167931
High on the Hog                   0.169619
Broken Arrow Trail                0.169760
Name: Hiline Trail, dtype: float64

'Hiline Trail' is most similar to 'Hangover Trail'! Several others share many characteristics!

In [9]:
# Creating a trail search term for Arizona Trails:
# This will bring up trails containing any part of the search term. 
search = "Hiline"
trails = az_trails[az_trails.index.str.contains(search)].index
for trail in trails:
    print(trail)
    print("Popularity: ", az_trails.loc[trail, 'popularity'])
    print("Number of Ratings: ", az_trails.T[trail].count())
    print("")
    print("10 Closest Users")
    print(az_rec[trail].sort_values()[1:11])
    print("")
    print("*"*35)
    print("")

Hiline Trail
Popularity:  1.0
Number of Ratings:  24

10 Closest Users
trail_name
Hangover Trail                    0.000617
Kellog/Incinerator Ridge          0.042147
Tabletop                          0.042998
Western Loop Trail                0.051682
Green Mountain                    0.070407
Baby Jesus Trail                  0.087600
Hog Heaven                        0.166381
Cathedral Rock Connector Trail    0.167931
High on the Hog                   0.169619
Broken Arrow Trail                0.169760
Name: Hiline Trail, dtype: float64

***********************************



<a id='recut'></a>
### Utah Trail Content Recommender

Trails with highest similarity between eachother represent lower values (with **'0'** being equal to itself, **'1'** being not similar at all)

In [10]:
# Which 10 trails are most similar to Portal?
# This field is a user input within the streamlit app!

ut_rec['Portal'].sort_values().head(11)[1:]

trail_name
Jacob's (Jackson's) Ladder    0.042215
Gold Bar Rim                  0.061473
Mt. Van Cott Trail            0.089496
Four Loko                     0.100298
La Dee Duh                    0.171413
Hell Canyon                   0.175551
Jackson                       0.188630
UFO                           0.195334
Mega Steps                    0.198167
Top of the World Jeep Road    0.202476
Name: Portal, dtype: float64

'Jacobs (Jackson's) Ladder' is most similar to 'Portal'! Several others share many characteristics!

In [11]:
# Creating a trail search term for Utah trails:
# This will bring up trails containing any part of the search term. 
search = "Portal"
trails = ut_trails[ut_trails.index.str.contains(search)].index
for trail in trails:
    print(trail)
    print("Popularity: ", ut_trails.loc[trail, 'popularity'])
    print("Number of Ratings: ", ut_trails.T[trail].count())
    print("")
    print("10 Closest Users")
    print(ut_rec[trail].sort_values()[1:11])
    print("")
    print("*"*35)
    print("")

Portal
Popularity:  0.9773706896551724
Number of Ratings:  24

10 Closest Users
trail_name
Jacob's (Jackson's) Ladder    0.042215
Gold Bar Rim                  0.061473
Mt. Van Cott Trail            0.089496
Four Loko                     0.100298
La Dee Duh                    0.171413
Hell Canyon                   0.175551
Jackson                       0.188630
UFO                           0.195334
Mega Steps                    0.198167
Top of the World Jeep Road    0.202476
Name: Portal, dtype: float64

***********************************

Poison Spider - Portal Connector
Popularity:  0.03771551724137934
Number of Ratings:  24

10 Closest Users
trail_name
7-Up to Rocky Tops Connector     0.049794
Jedi Slickrock                   0.051120
Bull Run Connector               0.055068
Sidestep (North)                 0.061445
Kane Creek Canyon Trail          0.063054
Inside Passage                   0.063625
Baby Steps Singletrack Loop 2    0.066205
Baby Steps Singletrack Loop 1    0.07226

<a id='user_rec'></a>
## 2. User - Based (Binary) Recommender
## Read Data- Arizona and Utah User Data

In [12]:
# reading in the cleaned, sorted Arizona user dataset for the recommender system
az_users = pd.read_csv('./data/all_arizona_users.csv')
az_users.head()

Unnamed: 0,user_name,trail_name
0,Maxx Byerly,Hiline Trail
1,Cameron McFarland,Hiline Trail
2,Ascanio Pignatelli,Hiline Trail
3,Sabrina Katharina,Hiline Trail
4,Clayton Burtsfield,Hiline Trail


In [13]:
# shape of df and verifying no nulls!
az_users.shape, az_users.isnull().sum().sort_values(ascending = False).head()

((5192, 2),
 trail_name    0
 user_name     0
 dtype: int64)

In [14]:
# reading in the cleaned, sorted Utah user dataset for the recommender system
ut_users = pd.read_csv('./data/all_utah_users.csv')
ut_users.head()

Unnamed: 0,user_name,trail_name
0,MadHamish H,Thunder Mountain Trail #33098
1,Matt Lane,Thunder Mountain Trail #33098
2,Phil Broadbent,Thunder Mountain Trail #33098
3,Jacob Crockett,Thunder Mountain Trail #33098
4,Heather Bond,Thunder Mountain Trail #33098


In [15]:
# shape of df and verifying no nulls!
ut_users.shape, ut_users.isnull().sum().sort_values(ascending = False).head()

((7346, 2),
 trail_name    0
 user_name     0
 dtype: int64)

#### Creating a User- Based Binary Recommender

In [16]:
def user_recommend(df):
    
    # adding binary rating column for trails that users rated
    df['binary_rate'] = 1
    
    # transforming to a pivot table
    pivot = df.pivot_table(index='user_name', columns= 'trail_name', values = 'binary_rate')
    
    # creating the sparse matrix
    sparse_users = sparse.csc_matrix(pivot.fillna(0))
       
    # calculating pairwise distances and building into a dataframe
    user_rec = pairwise_distances(sparse_users, metric = 'cosine')
   
    # saving pairwise matrix as a dataframe
    rec = pd.DataFrame(user_rec, index = pivot.index, columns = pivot.index)
    
    # return the dataframe
    return rec

### Arizona Users Pairwise_Distance DF

In [17]:
az_user_rec = user_recommend(az_users)
az_user_rec

user_name,A H,AJ Wanta,Aaron Cholewa,Aaron Davies,Aaron Frank,Aaron Hickson,Aaron Johnson,Aaron Lovato,Abe Ferraro,Abe Gold,...,sal serrano,sam schwann,skelldify,stuart schwartz,theiner Heiner,trevjens,victor thompson,yannick,Þorvarður Hálfdanarson,❤️
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A H,0.0,1.0,1.000000,1.0,1.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
AJ Wanta,1.0,0.0,1.000000,1.0,1.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Aaron Cholewa,1.0,1.0,0.000000,1.0,0.666667,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Aaron Davies,1.0,1.0,1.000000,0.0,1.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Aaron Frank,1.0,1.0,0.666667,1.0,0.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
trevjens,1.0,1.0,1.000000,1.0,1.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0
victor thompson,1.0,1.0,1.000000,1.0,1.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0
yannick,1.0,1.0,1.000000,1.0,1.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0
Þorvarður Hálfdanarson,1.0,1.0,1.000000,1.0,1.000000,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0


### Utah Users Pairwise_Distance DF

In [18]:
ut_user_rec = user_recommend(ut_users)
ut_user_rec

user_name,#deeznutzfosho,46and2,A B,A Estrada,A MG,A Rodriguez,AKA Surfer,AMANDA MELESSA,AOSR,Aaron Anderstrom,...,tharlow harlow,thehiker 2000,theiner Heiner,tourjee Tourjee,tracy bilhorn,tyler bostwick,tyte 754,wimolrat Tangtiphongkul,zachnielsen999 Nielsen,สีดำ ภูเขา
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
#deeznutzfosho,0.0,1.0,0.0,1.0,1.0,0.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,0.42265,1.0,1.0,1.0,1.0,1.0
46and2,1.0,0.0,1.0,1.0,1.0,1.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,1.00000,1.0,1.0,1.0,1.0,1.0
A B,0.0,1.0,0.0,1.0,1.0,0.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,0.42265,1.0,1.0,1.0,1.0,1.0
A Estrada,1.0,1.0,1.0,0.0,1.0,1.0,0.833333,1.0,0.711325,1.0,...,1.0,1.0,1.0,1.0,1.00000,1.0,1.0,1.0,1.0,1.0
A MG,1.0,1.0,1.0,1.0,0.0,1.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,1.00000,1.0,1.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
tyler bostwick,1.0,1.0,1.0,1.0,1.0,1.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,1.00000,0.0,1.0,1.0,1.0,1.0
tyte 754,1.0,1.0,1.0,1.0,1.0,1.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,1.00000,1.0,0.0,1.0,1.0,1.0
wimolrat Tangtiphongkul,1.0,1.0,1.0,1.0,1.0,1.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,1.00000,1.0,1.0,0.0,1.0,1.0
zachnielsen999 Nielsen,1.0,1.0,1.0,1.0,1.0,1.0,1.000000,1.0,1.000000,1.0,...,1.0,1.0,1.0,1.0,1.00000,1.0,1.0,1.0,0.0,1.0


<a id='azuser'></a>
### Arizona User- Based Binary Recommender
Users with highest similarity between eachother represent lower values (with **'0'** being equal to itself, **'1'** being not similar at all)

In [19]:
# Which 10 users are most similar to A H?

az_user_rec['A H'].sort_values().head(11)[1:]

user_name
Soloman Picoult          0.000000
Josh Richart             0.000000
Brian Derrick            0.292893
Brandon Sudeith          0.422650
Bob Spak                 0.500000
Mark Smith               0.552786
Nikki McIntyre           0.666667
Pablo Cortez             0.750000
Happy Cycling            0.781782
Michael Bartholomeusz    1.000000
Name: A H, dtype: float64

'Soloman Picoult' and 'Josh Richart' must be close riding partners to 'A H'.  Two other users are very close (less than 0.5) to 'A H'.  Then, users become quite dissimilar.
'A H' must be a strong rider since he has rated mostly challenging trails.

In [26]:
# Creating a user search term:
# This will bring up users containing any part of the search term. 
az_pivot = az_users.pivot_table(index='user_name', columns= 'trail_name', values = 'binary_rate')

search = "A H"
users = az_pivot[az_pivot.index.str.contains(search)].index
for user in users:
    print(user)
    print("Average Rating: ", az_pivot.loc[user, :].mean())
    print("Number of Ratings: ", az_pivot.T[user].count())
    print("")
    print("10 Closest Users")
    print(az_user_rec[user].sort_values()[1:11])
    print("")
    print("*"*35)


A H
Average Rating:  1.0
Number of Ratings:  1

10 Closest Users
user_name
Soloman Picoult          0.000000
Josh Richart             0.000000
Brian Derrick            0.292893
Brandon Sudeith          0.422650
Bob Spak                 0.500000
Mark Smith               0.552786
Nikki McIntyre           0.666667
Pablo Cortez             0.750000
Happy Cycling            0.781782
Michael Bartholomeusz    1.000000
Name: A H, dtype: float64

***********************************


<a id='utuser'></a>
### Utah User- Based Binary Recommender
Users with highest similarity between eachother represent lower values (with **'0'** being equal to itself, **'1'** being not similar at all)

In [28]:
# Which 10 users are most similar to A H?

ut_user_rec['AKA Surfer'].sort_values().head(11)[1:]

user_name
Justin Pingatore    0.183503
Evan Christensen    0.183503
Matt Davis          0.422650
Joshua Shockley     0.422650
Christi Worstell    0.422650
Chris Stewart       0.422650
Chris Marsh         0.422650
Igor K              0.422650
Mark Tjaden         0.422650
Luke Perkerwicz     0.422650
Name: AKA Surfer, dtype: float64

'Justin Pingatore' and 'Evan Christensen' match up very well with 'AKA Surfer'.  And all top 10 similar users share many trail ratings in common to 'AKA Surfer'.

In [32]:
# Creating a user search term:
# This will bring up users containing any part of the search term. 
ut_pivot = ut_users.pivot_table(index='user_name', columns= 'trail_name', values = 'binary_rate')

search = "Fred"
users = ut_pivot[ut_pivot.index.str.contains(search)].index
for user in users:
    print(user)
    print("Average Rating: ", ut_pivot.loc[user, :].mean())
    print("Number of Ratings: ", ut_pivot.T[user].count())
    print("")
    print("10 Closest Users")
    print(ut_user_rec[user].sort_values()[1:11])
    print("")
    print("*"*35)
  

Brian Fredricksen
Average Rating:  1.0
Number of Ratings:  2

10 Closest Users
user_name
Donny O'Neill     0.292893
Andrew Ozmun      0.292893
Wesley LeFevre    0.292893
Russell Ochoa     0.292893
Chris Sarot       0.292893
eric clark        0.292893
Brandon Tuttle    0.292893
Cat Sales         0.500000
Hayley Kemp       0.500000
Lloyd McFarlin    0.500000
Name: Brian Fredricksen, dtype: float64

***********************************
Fred Hudso
Average Rating:  1.0
Number of Ratings:  1

10 Closest Users
user_name
Alex Leibold            0.422650
Jon Zanone              0.422650
Justin Steele           0.422650
Chad Hackley            0.500000
John Connolly           0.905084
Michael Martori         1.000000
Michelle Hoffer         1.000000
Michelle Manke-Horat    1.000000
Miguel Suarez           1.000000
Mike Anderson           1.000000
Name: Fred Hudso, dtype: float64

***********************************
Freddy Calk
Average Rating:  1.0
Number of Ratings:  1

10 Closest Users
user_name