# Summary Notes

## Business Challenge

<b>Aim</b>
<br>Identify user-profiles based on TFL cycling usage data to drive a targeted marketing campaign.</br>

<br><b>Theory</b>
<br>Pairing geography with pyschology is a powerful tool for influencing behaviour.</br></br>

## Infrastructure Requirements

<b>Google Cloud Platform

<li>Need to generate an API token on GCP and store it in a JSON file.</li>
<li>Forward the absolute JSON file path to initialise the Google client in Notebook</li>
<li>Import various Python libraries:
    <ul>
        <li><i>google.cloud</i> - connect to BigQuery</li>
        <li><i>pandas</i> - data wrangling</li>
        <li><i>os</i> - get file path</li>
        <li><i>math</i>, <i>numpy</i> - data analysis</li>
        <li><i>matplotlib</i> - data visualisation</li>
        <li><i>folium</i> - HTML street map</li>
        <li><i>sklearn, hdbscan</i> - clustering analysis</li>
    </ul>
</li>

## Data Analysis

<b>Visual Exploration

<li>To visualise user-behaviour on a map, we have to plot co-ordinates of their journeys.</li>
<li>This requires a SQL JOIN between 2 datasets to get the co-ordinates between stations for each hire.</li>
<li>Trigonometry used to calculate journey distance between the 2 stations.</li>
<li>From the duration, we can find the average speed of the journey for a particular starting station.
    <ul><li>User speed is mapped by the size of bubbles on a street map.</li>
        <li>User speed is mapped also by the colour of bubbles, producing a RAG chart:</li>
        <li>small-green = "slow"</li>
        <li>medium-amber = "medium"</li>
        <li>big-red = "fast"</li>
    </ul>
</li>
<li>Slow users appear to cluster around areas such as Hyde Park and Kensington on visual inspection.</li>
<li>Faster users appear to cluster between King's Cross and City of London.</li>
<li>May indicate the 'Leisure vs Rush' hypothesis i.e. users fit into a distinct classification system.</li>

<b>Statistical Formalisation

<li>Wanted to generalise where the decision-boundaries lie without manual visual inspection.</li>
<li>Issue of "Pigeon-hole" principle: 500 stations and 1000 results - some data points will stack. Hard to see.</li>
<li>No data for whether the user <b>intended</b> the journey to be "slow/medium/fast" so used unsupervised ML.</li>
    <ul>
        <li>"fast" users located mostly between King's Cross and Pall Mall.</li>
        <li>"slow" users located mostly in W and SW London</li>
    </ul>
</li>

<b>Additional Time-Dependency Considerations

<li>In peak-summer, Hyde Park is the most popular station whereas office-area stations are more popular towards winter.</li>
<li>Possibly reinforces the 'Leisure vs Rush' Hypothesis where users can be distinctly categorised by their journey choice.</li>

<b>Station Popularity Distribution

<li>The top 10 cycle stations accounted for 5% of all hires.</li>
<li>This means they are 2.5X more popular than the average station.</li>
<li>Some cycle stations aggregate by TFL tube station such as Hyde Park and Waterloo.</li>
<li>The unique TFL stations are:
    <ul>
        <li>Hyde Park</li>
        <li>King's Cross</li>
        <li>The Borough</li>
        <li>Waterloo</li>
        <li>Liverpool Street</li>
    </ul>
</li>
<li>I predict placing posters in these stations, especially for Hyde Park in summer will have the most effective impressions.</li>

# Recommended Directives

<b>"The Commuter"

<li>Appears to be a person who travels quickly around office areas and high-traffic commuting networks.</li>
<li>Make the cycles appealing to them by fulfiling their needs of utility and convenience.</li>
<li>Most effective stations to advertise in: King's Cross, Waterloo, Liverpool Street, The Borough</li>
<li><i>"Beat the Bus. Avoid traffic and reduce your commute-time by 25% on Santander cycles."</i></li>

<b>"The Park user"

<li>Appears to be a person who chooses to use the park for leisure, especially in summer.</li>
<li>Make the cycles appealing to them through the lens of exploring the park better.</li>
<li>They may also be the type to appreciate the benefits of excercise.</li>
<li>Most effective stations to advertise in will be Hyde Park (and possibly Queensway, Lancaster Gate). </li>
<li><i>"Nothing beats the breeze in my hair while cycling over Serpentine Bridge." - [Relatable Person]</i></li>
<li><i>"Can't get 10k steps? Try 30m of cycling with Santander at London's best green space."</i></li>