### 1. Project Scope
VimeoOTT sellers have the option to offer an initial 7,14, or 30 day free trial to their subscribers. Once this free trials is complete, subscribers may convert to a paying subscription or cancel their subscription entirely. For this project, I'd like to create a machine learning model that predicts the probability that a user will convert at the end of their free trial. I think the most important factors in predicting this will be length of free trial, view time (how long a user spent watching content during their free trial), and platform in which they began their subscription. 

### 2. Datasets
#### a. Subscription Events
subscription_events logs every event during a user's subscription, including a subscription's creation, renewals, and cancellation. This table also logs whether a user was in an intial free trial, the length of the free trial, and the platform (i.e., iOS, android, web) in which the user began their subscription.

In [1]:
import pandas as pd
sub_events = pd.read_csv('/Users/villarreala/Desktop/subscription_events.csv', low_memory=False)
sub_events.head()

Unnamed: 0,id,event_type,ticket_id,created_at,site_id,paying,free_trial,platform,campaign,free_trial_days
0,2235107,renewed,2924923,9/5/17 20:41,28998,+,-,web,(null),7
1,2235111,charge_failed,2925017,9/5/17 20:41,20369,(null),(null),web,(null),7
2,2235113,charge_failed,2337940,9/5/17 20:41,8342,(null),(null),web,(null),30
3,2235116,renewed,2340434,9/5/17 20:42,18790,(null),(null),web,07116e938f-MHz_Choice_CANCELLED_Subs_2017_Prem...,(null)
4,2235119,renewed,2253712,9/5/17 20:42,20255,(null),(null),web,(null),30


Variable | Description | Type of Variable
---| ---| ---
id| unique identifier|
event_type | type of event | categorical 
ticket_id | subscription identifier | categorical
created_at | event timestamp | timestamp
site_id | subscription's corresponding site | categorical
paying | + = began paying - = stopped paying | categorical
free_trial | + = began free trial - = ended free trial | categorical
platform | platform where user onboarded | categorical
campaign | associated campaign | categorical
free_trial_days | length of free trial |categorical

#### b. Video Events
video_events logs events as a user watches a video. These events include: first play, timeupdates every 10 seconds, skips, and ends. This table allows us to see how long and how far a user got through a specific video, as well as how many videos they watched within a specific session/timeframe. 

In [2]:
video_events = pd.read_csv('/Users/villarreala/Desktop/video_events.csv', low_memory=False)
video_events.head()

Unnamed: 0,id,name,TIMESTAMP,seconds,video_id,user_id,city,country,region,session_id,timecode,duration,platform,s3_date
0,3968266663,timeupdate,1483953341,10,113467,2078676,(null),ES,54,android:113da848314bc9a3,873110,2186,android,1/9/17
1,3968266665,timeupdate,1483953342,10,56685,2678928,Walnut Creek,US,CA,NW0M7V9T-6PCO-OC2B-WONY-SDNCI7CCTMUW,2200,5369,appletv,1/9/17
2,3968266666,timeupdate,1483953342,10,57478,2647666,Bellevue,US,WA,02E8-4A2F-3917-9927-876B,2960,3374,roku,1/9/17
3,3968266667,timeupdate,1483953342,10,65455,2691695,Toledo,US,OH,Roku:9BD6-1DC7-ED99-34AC-960E,12730,28903,roku,1/9/17
4,3968266669,timeupdate,1483953343,10,47897,2507400,Adissan,FR,A9,iPad:B0E774A5-AC17-4933-B864-F50B62BBE7EA,10,609,ipad,1/9/17


Variable | Description | Type of Variable
---| ---| ---
id| unique identifier|
name | type of event | categorical 
timestamp | event timestamp | timestamp
seconds | 10 during timeupdate events | categorical
video_id | video identifier | categorical
user_id | user identifier | categorical
city | viewing city | categorical
country | viewing country | categorical
region | viewing region | categorical
session_id | id for user's session | categorical
timecode | where user is in video | continuous
duration | duration of video |categorical
platform | viewing platform | categorical
s3_date | viewing date | continuous

### 3. Domain Knowledge
I support VimeoOTT on a daily basis so I have a working knowledge of the dataset and the issue at hand, however, no work has been done to support this initative. After researching a little, it appears there is valuable documentation on this sort of model (see: https://www.strong.io/blog/predicting-customer-behavior-machine-learning-to-identify-paying-customers) 

### 4. Project Concerns
One concern/quesion I have is whether subscribers across all sites will exhibit the same behavior when they convert or not. VimeoOTT hosts a range of different sellers (in terms of genre and content library) which may need to be taken into account in this model.

Another concern I have is not having access to support data. Presumably users who contact support more frequently are having problems with the product and are less likely to convert.

I don't forsee any potential costs associated with an inaccurate model, as any initatives taken as a result would not lower conversion rates. 

Alternatively, I think VimeoOTT sellers would benefit from an accurate model because: a) understanding which users are more likely to convert would help them develop targeted marketing campaigns and b) understanding which users are less likely to convert would allow them to tailor support and promotions to high-risk users (i.e., users most likely to churn).

### 5. Outcomes

I (and my target audiecnce, VimeoOTT team) expect the output to be how likely a user is to convert from free trial to paying subscription.

Success for this project would look like higher conversion rates for VimeoOTT sellers. At the same time, this project will only provide sellers with the necessary data to take action, but ultimately it is up them to take these results into consideration.