# Looking at YouTube view count growth for popular music videos

## Strategy: 

1. We are going to write two scripts. One is going to get the url and title of the MV. Anotehr is going to write a script to collect the views of the music video from a specific time period. This should also contain the url. 
2. We are going to look at the #music video on YouTube

## Importing our libraries 

In [1]:
import pandas as pd
import random
import time

from selenium.webdriver import Chrome
from pymongo import MongoClient 
from datetime import datetime

## Initializing test browser

In [2]:
browser = Chrome()

## Connect to MongoDB (localhost)

In [3]:
connection = MongoClient()
db = connection['music_scraping']

## Get Music Video List on Youtube

Get music video list with youtue and store it to MongoDB

In [4]:
# open Youtube page 
url = "https://www.youtube.com/results?search_query=%23music+video"
browser.get(url)
time.sleep(3)

# Get video title & URL
videolinks = browser.find_elements_by_id('video-title')
videos = [(link.text, link.get_attribute('href')) for link in videolinks]

# Store to mongodb collection('videos')
coll = db['videos']
for title, url in videos:
    coll.insert_one({'title': title,
                 'url': url })

In [5]:
coll.count_documents({})

20

## Define get_view_count function

Open Youtube URL and get view count on the page

In [6]:
def get_view_count(browser, url):
    """Return the view_count and timestamp"""
    browser.get(url)
    time.sleep(3)
    now = datetime.now()
    sel = 'span.view-count'
    view_count = browser.find_element_by_css_selector(sel).text

    return {
        'view_count': int(''.join([n for n in view_count if n.isdigit()])),
        'timestamp': now } 

## Get video list from MongoDB & Get view count every 10 mins

In [7]:
# connect to mongodb and get all videos
connection = MongoClient()
db = connection['music_scraping']
coll = db['videos']
cur = coll.find({})

In [8]:
# for each video, get view_count at now
#while forever every 10 mins

while True:
    coll = db['view_count']
    for video in cur:
        print(video['title'])
        count = get_view_count(browser, video['url'])
        coll.insert_one({
            'title': video['title'],
            'view_count': count['view_count'],
            'timestamp': count['timestamp']
        })
    time.sleep(600)    
    

Dua Lipa - New Rules (Official Music Video)
Marshmello - Alone (Official Music Video)
Trey Songz - Animal [Official Music Video]
Cardi B - Money [Official Music Video]
Hozier - Almost (Sweet Music) (Official Video)
Marshmello - Tell Me (Official Music Video)
Post Malone - "Wow." (Official Music Video)
Lil Dicky - Freaky Friday feat. Chris Brown (Official Music Video)
6IX9INE "Gotti" (WSHH Exclusive - Official Music Video)
Rudy Mancuso - Mama (Official Music Video)
Marshmello & Anne-Marie - FRIENDS (Music Video by Sofie Dossi)
XXXTENTACION - BAD! (Official Music Video)
Camila Cabello - Havana (Official Music VIdeo) ft. Young Thug
Katy Perry - Making of “Bon Appétit” Music Video ft. Migos
Shakira - Can't Remember to Forget You (Official Music Video) ft. Rihanna
Lorn - Acid Rain (Official Music Video)
Little Mix - Black Magic (Official Music Video)
Lele Pons - Celoso (Official Music Video)
Marshmello - Blocks (Official Music Video)
Lele Pons & Fuego - Bloqueo (Official Music Video)


## Check the result

In [9]:
coll = db['view_count']
# query = {'title': 'Taylor Swift - ME! (feat. Brendon Urie of Panic! At The Disco)'}
query={}
cur = coll.find(query)

df = pd.DataFrame(list(cur))

In [10]:
df.head()

Unnamed: 0,_id,timestamp,title,view_count
0,5cc3dc3a1fc070103bd52faa,2019-04-26 21:36:10.482,Dua Lipa - New Rules (Official Music Video),1757152189
1,5cc3dc3f1fc070103bd52fab,2019-04-26 21:36:14.978,Marshmello - Alone (Official Music Video),1297324712
2,5cc3dc421fc070103bd52fac,2019-04-26 21:36:18.951,Trey Songz - Animal [Official Music Video],63851539
3,5cc3dc471fc070103bd52fad,2019-04-26 21:36:22.998,Cardi B - Money [Official Music Video],67298340
4,5cc3dc4a1fc070103bd52fae,2019-04-26 21:36:26.710,Hozier - Almost (Sweet Music) (Official Video),1456968


In [13]:
coll = db['view_count']
query = {"title" : "Dua Lipa - New Rules (Official Music Video)",}
cur = coll.find(query)

df = pd.DataFrame(list(cur))
df

Unnamed: 0,_id,timestamp,title,view_count
0,5cc3dc3a1fc070103bd52faa,2019-04-26 21:36:10.482,Dua Lipa - New Rules (Official Music Video),1757152189


## Close the broswer

In [14]:
browser.close()