# Data Engineer Strean Hatchet Technical Test 
<br>

Hello Stream Hatchet Crew! I hope you enjoy this: 



<br>


****

## 1. You are given the following SQL tables:

<br>

a) streamers: it contains time series data, at a 1-min granularity, of all the channels that broadcast on
Twitch. The columns of the table are:

<br>

 * username: Channel username
 * timestamp: Epoch timestamp, in seconds, corresponding to the moment the data was captured
 * game: Name of the game that the user was playing at that time
 * viewers: Number of concurrent viewers that the user had at that time
 * followers: Number of total followers that the channel had at that time

<br>

b) games_metadata: it contains information of all the games that have ever been broadcasted on Twitch.
The columns of the table are:

<br>

* game: Name of the game
* release_date: Timestamp, in seconds, corresponding to the date when the game was released
* publisher: Publisher of the game
* genre: Genre of the game

<br>


I am using a DBeaver Sample DataBase in order to see my results! <br>
I created both tables as following:

<br>

```mysql

CREATE TABLE `streamers` (
  `username` varchar(64) NOT NULL,
  `timestamp` datetime NOT NULL,
  `game` varchar(32) NOT NULL,
  `viewers` integer NOT NULL,
  `followers` integer NOT NULL
  
);



CREATE TABLE `games_metadata`(

    `game` varchar(64) NOT NULL,
    `release_date` datetime NOT NULL, 
    `publisher` varchar(64) NOT NULL, 
    `genre` varchar(64) 

);



```










## Write an SQL query to:

<br>

#### 1. Obtain, for each month of 2018, how many streamers broadcasted on Twitch and how many hours of content were broadcasted. The output should contain **month**, **unique_streamers** and **hours_broadcast**.


<br>


```mysql 

SELECT  strftime('%m',`timestamp`) AS months ,COUNT(DISTINCT username)AS `unique_streamers`, COUNT( strftime('%M',`timestamp`))/(60*1.0) as hours_broadcast
FROM streamers where strftime('%Y',`timestamp`) = '2018'
GROUP BY months 

```

<br>

<br>

So first we select the month with the strftime function for the month display (and to later aggregate the data by month), since we only want the months of 2018, we specify the timestamp year for 2018 in the **FROM** statement. <br>
We use **COUNT (DISTINCT username)** in order to obtain the total number of different streamers that will be aggregated by the months column we created beforehand.<br>     
The data is captured on a per minute basis, duplicated timestamps are valid sicne you'll most likely have multiple streams at the same time.<br>
My approach was to count all rows ( duplicate included.The datetime format doesn't matter since this is a time series with 1 minute granularity), and divide it by 60 to get the number of hours.       



<br>
<br>
<br>










#### 2. Obtain the Top 10 streamers that have percentually gained more followers during January 2019, and that primarily stream FPS games. The output should contain the **username** and **follower_growth**.

<br>


```mysql

SELECT username, ((MAX(followers)*1.0-MIN(followers)*1.0)/MIN(followers)*1.0) AS follower_growth FROM (SELECT username,followers, genre, "timestamp" FROM (SELECT *
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game) 
WHERE strftime('%Y',"timestamp") = '2019' and strftime('%m',"timestamp") = '01' and genre = 'FPS')   
GROUP BY username
Order by follower_growth DESC
LIMIT 10 


```

The first thing we need to do is a inner join table to dintiguish FPS from non FPS games.<br>

<br>

**SELECT *
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game)**


<br>

Now we do a subquery on the the table we just "created", where we select what we need: **username** to later display and also to group by ,**followers** to calculate the growth , **genre** to use as a condition for FPS games,**timestamp** to filter only Jan of 2019.


<br>

With this "newly created table" ( it's not a table it's only a query, but we can think of it as a table because we are gonna query from a query), and we use:

<br>

**WHERE strftime('%Y',"timestamp") = '2019' and strftime('%m',"timestamp") = '01' and genre = 'FPS')**

<br>

To filter Jan 2019 and FPS games 

<br>

**SELECT username, ((MAX(followers)*1.0-MIN(followers)*1.0)/MIN(followers)*1.0) AS follower_growth**

<br>

To calculate growth we used the formula above ( multiplication by 1.0 to typecast to decimal) 

<br>


**GROUP BY username Order by follower_growth DESC LIMIT 10**

<br>

and ofcourse we need to aggregate and order the data as requested.


<br>
<br>
<br>

#### 3. Obtain the Top 10 publishers that have been watched the most during the first quarter of 2019. The output should contain publisher and hours_watched.

<br>

##### Note: Hours watched can be defined as the total amount of hours watched by all the viewers combined. Ie: 10 viewers watching for 2 hours will generate 20 Hours Watched.


<br>
<br>

```mysql

SELECT publisher, (cast(strftime('%m', "timestamp") as integer) + 2) / 3 as quarter, COUNT((strftime('%M',`timestamp`)/(60*1.0)) * viewers) as total_hours_watch
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game 
WHERE quarter = 1
GROUP BY publisher 
ORDER BY total_hours_watch DESC
LIMIT 10 ;

```

<br>
<br>
<br>
<br>

****


# 2.

<br>


*"Imagine a new streaming platform has recently launched. They provide an API endpoint that allows
third-parties to obtain, at any given time, the list of all the channels broadcasting in the platform, how
many concurrent viewers each channel has, what game is each channel playing, etc.<br>
At Stream Hatchet we want to capture that information and offer it to our clients through our web app,
providing rankings of top-performing streamers and games for each day, week, month, etc. <br>
Explain, in detail, how would you design and implement a system that is able to achieve that. From the
data gathering to serving the information to the web app so that the end user can consume it, detail
how you would implement each step, focusing on scalability and reliability.<br>
Describe what specific technologies, frameworks, and tools you would use and how you would deploy
the system on a cloud-native infrastructure."*

<br>
<br> 

https://www.youtube.com/watch?v=skc-ZEU9kO8


To solve this problem I would  implement a **Background Job**! With Heroku's **Worker Dyno**.
 
<br>

Background jobs can **dramatically improve the scalability of a web app** by enabling it to offload slow or CPU-intensive tasks from its front-end. This helps ensure that the front-end can handle incoming web requests promptly, reducing the likelihood of performance issues that occur when requests become backlogged.

<br>
<br>

### **When do we use Background Jobs?**

<br>
<br>

 * **Communicating with an external API or service**
 
 <br>
 
 * Performing resource-intensive data manipulation, such as image or video processing

<br>
<br>


### **Order of operations** <br>

<br>

The following are the high-level steps for handling a request that uses a background job:<br><br><br>

   1. A client sends an app a request to perform a task that is well suited to a background job.<br><br>
   2. The app’s front-end (known on Heroku as the web process) receives the request. It adds the task to a job queue and immediately responds to the client. The response indicates that the result of the request is pending, and it optionally includes a URL the client can use to poll for the result.<br><br>
   3.  A separate app process (known on Heroku as a worker process) notices that a task was added to the job queue. It takes the task off of the queue and begins performing it.<br><br>
   4.  When the worker process completes the task, it persists the outcome of the task. For example, in the case of uploading a file to Amazon S3, it might persist the file’s S3 URL.<br><br>
   5.  The client polls the app on a regular basis until the task is completed and the result is obtained.<br><br>
   
   
   
To implement this system with python programming language I would use **RQ(Redis Queue)** library with a **Redis Server**!  

<br>

* **RQ (Redis Queue) is a simple Python library for queueing jobs and processing them in the background with workers. It is backed by Redis and it is designed to have a low barrier to entry. It can be integrated in your web stack easily.**



<br>
<br>
   
Configuration is easy just use the following command:

<br>
   
```sh 
pip install rq
```


<br>
<br>

Next we create a **worker** script ! And this worker script will listen to queued tasks and process them as they are received

<br>

Then we'd just create a module to request data from the API! 

<br>

## Data gathering 

<br>

For the data gathering part, i'd just use python's **request** library, here follows an example to get data from the ISS's API. 

<br>

### Simple GET request to retrieve information from the OpenNotify API. 
<br>

Here I show a little tutorial on how to request the data from an API.

In [1]:
import requests 


response = requests.get("http://api.open-notify.org/iss-now.json")

# Print the status code of the response.
print(response.status_code)

200


This means we are connected to the API.

<br>

Each status code means something: <br>

<br>

* 200 — everything went okay, and the result has been returned (if any)
* 301 — the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
* 401 — the server thinks you’re not authenticated. This happens when you don’t send the right credentials to access an API.
* 400 — the server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
* 403 — the resource you’re trying to access is forbidden — you don’t have the right permissions to see it.
* 404 — the resource you tried to access wasn’t found on the server.

<br>
<br>


In [2]:
response = requests.get("http://api.open-notify.org/iss-pass.json")
print(response.status_code)

400


Here we see that the server thinks you made a bad request, as stated above it probably means that you are not sending the right data along with the request!<br>
If you look into the [API Documentation](http://open-notify.org/Open-Notify-API/), you'll see that the ISS-PASS endpoint requires two paramenters!<br>
<br>
The ISS Pass endpoint returns when the ISS will next pass over a given location on earth, to do this you need ofcourse the lat and long of your chosen location!<br>
<br>
You can input the parameters directly into the URL as follows http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74 or setup the parameters as a dictionary!
<br>
Since we are in Barcelona(41.3851° N, 2.1734° E) let's see when the ISS will hoover over us!  



In [3]:
# Set up the parameters we want to pass to the API.
# This is the latitude and longitude of Barcelona.
parameters = {"lat": 41.3851, "lon": 2.1734}
# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)
# Print the content of the response (the data the server returned)
display(response.content.decode("utf-8"))


print("\n \n \n")



type(response.content.decode("utf-8"))


'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1568206691, \n    "latitude": 41.3851, \n    "longitude": 2.1734, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 608, \n      "risetime": 1568238441\n    }, \n    {\n      "duration": 638, \n      "risetime": 1568244221\n    }, \n    {\n      "duration": 577, \n      "risetime": 1568250095\n    }, \n    {\n      "duration": 584, \n      "risetime": 1568255953\n    }, \n    {\n      "duration": 646, \n      "risetime": 1568261761\n    }\n  ]\n}\n'


 
 



str

We can see that the server returns us with a string.<br>

<br>

Strings are the way that we pass information back and forth to APIs, but it’s hard to get the information we want out of them.<br>
There's a much better way of getting data and it's trought json files.<br><br>
 JSON is a way to encode data structures like lists and dictionaries to strings that ensures that they are easily readable by machines, JSON is the primary format in which data is passed back and forth to APIs, and most API servers will send their responses in JSON format.<br><br>
Python supports JSON trough an inbuilt module called json.<br><br>
 
The json module converts lists and dics to JSON, and strings to lists and dictionaries,in order to do this the module has 2 main methods:

  * **dumps** — Takes in a Python object, and converts it to a string.
  * **loads** — Takes a JSON string, and converts it to a Python object.


<br>
<br>
 
### Getting JSON from an API request 
You can get the content of a response as a python object by using the .json() method on the response.
<br>

In [4]:
# Make the same request we did earlier, but with the coordinates of San Francisco instead.
parameters = {"lat":41.3851, "lon": 2.1734}
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)
# Get the response data as a python object. Verify that it's a dictionary.
data = response.json()
print(type(data))
print(data)

<class 'dict'>
{'message': 'success', 'request': {'altitude': 100, 'datetime': 1568206691, 'latitude': 41.3851, 'longitude': 2.1734, 'passes': 5}, 'response': [{'duration': 608, 'risetime': 1568238441}, {'duration': 638, 'risetime': 1568244221}, {'duration': 577, 'risetime': 1568250095}, {'duration': 584, 'risetime': 1568255953}, {'duration': 646, 'risetime': 1568261761}]}


In [5]:
print(data['request']['altitude'])

100


### Content type 
<br>

We can also access the response metadata, that contains information on how the data was generated and how to decode it,this metadata is stored in the response headers, we can access it through the headers method.
<br>

The headers method returns a dictionary,the most relevant key-pair for extracting data is the 'Content-Type', since it tells you which type of data the server returns to you.(In this case is a Json file)



In [6]:
# Headers is a dictionary
print(response.headers)
# Get the content-type from the dictionary.
print(response.headers["content-type"])

{'Server': 'nginx/1.10.3', 'Date': 'Wed, 11 Sep 2019 12:58:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '522', 'Connection': 'keep-alive', 'Via': '1.1 vegur'}
application/json


So we'd make an API request and save it as a module to call on the worker process.
<br>

In your application, we **create a RQ queue**.
<br>

And enqueue the function call(our GET request) :


<br>

# Deployment

<br>

Add the worker process to the Procfile in the root of the project.

<br>

Then, provision an instance of Redis with the Redis To Go addon and deploy with a git push.


```shell


$ heroku addons:create redistogo


$ git push heroku master

```


<br>
Once everything’s pushed up you can scale your workers according to your needs:

```shell
heroku scale worker=1
```


<br>


The same could also be achieved with Celery, there are many ways of implementing a system like this, and it also depends on which cloud our web app is hosted and the framework's used for it's backend!

<br>

<br><br>
****
<br><br>
# 3.
<br><br>


<br>

*"A 4-year-old is trying to build a tub for his goldfish out of Lego. Every Lego piece is stuck to the piece to
its left and its right (except for the first and last one). All the pieces have a width of 1 unit.
<br>Write a program, using the programming language of your choice, that given the heights (in units) of the
lego pieces from left to right, outputs the total amount of water held over the pieces that the kid built."*




<br><br>
<br><br>


For this question I decided to make a short 5 minute youtube clip, detailing my approach, I hope you enjoy it! 


<br>

<br>





[<img src="youtubesnap.png">](https://www.youtube.com/channel/UCW6qBfYgb-w1EiplkuQf9hA?view_as=subscriber)



<br>

# Implementation:

<br>
<br>

In [7]:
def max_min_rl_lr(*args):
    """
    Given a number of values specified by the user
    returns the a list with the biggest elements updated from right to left, 
    and from left to right.
    
    ie - input [3,1,3,4]
    
    outputs : rl = [3,3,3,4]
            : lr = [4,4,4,4]
    
    :param *args: int  
    :return: rl , lr , lists 
     
    """
    rl = list() # list of maximum right to left 
    lr = list() # list of maximum left to right 
    big_rl = None # holder for max value
    big_lr = None # holder for max value
    og =  list(args)
    for w in args: # Loops through the arguments
       
        # condition if holder is 0,(beggning of the loop) appends the first value 
        # or if current element is bigger than holder, holder becomes the new value  
        if big_rl is None or w > big_rl: big_rl = w 
            
        rl.append(big_rl) # appends the 
            
    
    # The same as before but with the input inverted 
    
    for w in list(reversed(args)): # list(reversed(args)) , reverses the order of the list, reading left to right
    
        if big_lr is None or w > big_lr: big_lr = w
            
        lr.append(big_lr)
    
    # We need to revert the list again so it's in the "orignal format"
    
    return rl ,list(reversed(lr)) , og 
    
            
            
    

In [13]:
def draw_tank(original):
    """
    Draws the original configuration 
    
    :params original: list 
    
    :returns: void
    
    """
    print("\n Original Configuration is : \n")
    
    
    
    for row in range(max(original), 0, -1): # Loops from max of the list to 0, so to get row 3 , row 2, row 1.
        print(' '.join(['#' if height >= row else ' ' for height in original])) # prints horizontally a symbol if the height (element of list) => row number  
                                                                     # else prints an empty space
    
   


In [14]:
def min_of_two_lists(l1,l2):
    """
    Given two lists, iterates through both in paralel and stores the minimum
    into a new list.
    
    :param l1: list
    :param l2: list
    :return: list
      
    """
    # list comprenhension, loops through both lists in paralel with the zip function,
    # and stores the minimum element per iteration     
    
    return [min(j) for j in zip(l1,l2)]


    
    

In [15]:
def total_water(original,minimum):
    """
    Calculates the difference the newly created list that holds the minimum of the right to left and left to right peaks list
    with the orignal list of the configuration specified of the user, and calculates the volume of water.
    
    :param original: list
    :param minimum: list
    :return: list
    
    """
    import numpy as np

    
    sub_list =list(np.subtract(minimum,original))
    
    return(np.sum(sub_list),sub_list)


In [16]:
rl , lr, og =  max_min_rl_lr(3,1,1,2)

min_two =  min_of_two_lists(rl,lr)

draw_tank(og)


print("\nOriginal configuration: {}".format(og))
print("\n\nRight to left peaks: {}".format(rl))
print("\nLeft to right peaks: {}".format(lr))



print("\n\nMinimum of {} and {} : \n\n {}".format(rl,lr,min_two))

water,dif = total_water(og,min_two)

print("\n\n The total amount of water held by the configuration is : \n \n {}".format(water))





 Original Configuration is : 

#      
#     #
# # # #

Original configuration: [3, 1, 1, 2]


Right to left peaks: [3, 3, 3, 3]

Left to right peaks: [3, 2, 2, 2]


Minimum of [3, 3, 3, 3] and [3, 2, 2, 2] : 

 [3, 2, 2, 2]


 The total amount of water held by the configuration is : 
 
 2


****

<br>

# 4 

<br>

## a)  Focusing on one or two sections of your choice, explain what insights you can extract from the data that is being represented.  

<br><br>


## **Channel Searcher Section** 

<br><br>


I found this section super usefull for streamers that want to take an analytical approach analysis of  their stream performance and for sponsors who are looking for successful streamers to promote out their brand/product! 
<br>

<br>
The user with a simple look he can draw a couple of important KPI's such as:

<br>

*  Which **game** draw's more **viewership**, produces more **followers**, **hours watched**   . 

*  What's the **best combination of games** to be played in section

*  Which is the **best time to stream** a certain game, or just stream overall

*  The best **stream titles**  

<br>

This KPI's can be extracted over different ranges of time, so they could evaluated them by day,week or month. 


<br> 

### Shroud Channel Example

<br>

<img src="shroud.png">




<br>


Here we have a brief statistical summary of the channel performance, for the month of July. <br>

<br>

Which is of interest for the streamer as well for potential sponsors!

<br> Taking a more detailed look :



<img src="shroudmetrics.png">



<br><br>


We can see that the most played game during the month of **August** was **World of Warcraft**,  it also has the most followers and new viewers, however if we take a look at **Call of Duty** the least broadcasted game was the one had the best **Peaks & AVG CCV's** and also recorded **half the number of New followers**  with only **5%** of broadcasted air time compared to **Wow** .

<br>

With **this information** streamers can choose which games to play in order to **prioritize** the **metric** they wish to improve as well as **measure the impact a game has on the audience**. 


<br><br>

## **Rankings Section** 



<br><br>

In this section, we can analyse the different KPI's over different platforms ! <br>
<br>
We know twitch is the main platform for streaming gaming content, but **sometimes sheer viewership** doesn't tell us the whole story!
<br>
The other performance metrics, can help users evaluate the platform performance in various different ways.For example a sponsor might be more interested in **overall platform statistcs**, while a streamer might have more interest in **per channel statistics**   

<br>
<br>

<img src="HoursWatched.png">

<br>
<br>

<br><br>

A look at the number of unique channels in each platform shows us another perspective:


<img src ="UniqueChannels.png">

Microsoft's streaming platform Mixer has showed a steady growth, even overthrowing twitch this past month!! 

<br>
<br>

However I'd like to pose a **question**:

<br>

* **Does more content creators create a more health and active platform?**

<br>

Using only logic, one would infere that more content creators is always good thing to have.However we can also pose this **question**:

<br>


* **Is this increase of  active content creators benefiting or hurting the content creators, and the platform itself?**

<br>



<img src ="avgCCV_channel.png">


Too much streamers is not such a good thing after all, and it **may** be causing **an AVG CCV per channel of less than 1 viewer**.<br>
<br>
Facebook has hands down the best values,it has alot less channels than mixer or twitch resulting in a excellent AVG CCV for it's streamers !  <br>


<br> 
<br>
And when looking at the most **channel's popular streaming moments,Youtube is on par with twitch**. 

<br>
 
<img src ="PeakCCVChannel.png">
 
 
<br>
<br>

To summarise **this section offers different perspectives on channels performance**, and lets the users of **Stream Hatchet's Platform** decide which is the platform more suitable for them! 


<br>
<br>



# Bonus of ranking section 

<br>
<br>

**I know that time is precious for you guys at Stream Hatchet, so feel free to skip this!** 


W

<img src="average_concurrent_platform.png">

<br>
<br>

We can see the sheer enourmous impact that twitch has on the streaming scene, aswell as it's consistence growth over the years.
It's no surprise Twitch dominates the scene, as it started years before it's main competitors, building a strong brand over the years.<br><br>

This has led to having over **1 million more average concurrent vierwes** over it's main competitor Youtube, this is an abysmal difference, we can also note that in 1 year and 9 months,  **Twtich** has **doubled** it's avg cucurrent viewers from **double of youtube's best record (~400k)** to a whooping **1.33570** recorded this August. <br> <br>
One interesting season pattern found present over the years of twitch's existence(specially since 2015) seen in **average concurrent viewers** is the **growth** in the **month of January**!This is highly noticable **in 2018** where it had it's **biggest growth ever recorded!**  <br>

<br>

My first thought was that most **publishers**,in order to maximize their sales,tend to **publish games during Christmas holidays**.However I didn't find   data to support this hypothesis.<br>

<br>

<br>

<br>



<img src="table.png">


<br>
<br>



Looking at the performance of the channel ELEAGUE of the month of January 2018, confirms this. 


<br>
<br>

<br>







<img src ="ELEAGUES.png">




<br>
<br>

<br> <br> 




**My question**: Is ELEAGUES Majors tournament causing a peak CCV in the whole platform, or is this a seasonal holiday pattern? 
<br><br>
<img src ="PeakJan.png">


<br>

The **ELEAGUES CS MAJORS FINALS** took place on Jan 29th, which also corresponds to the highest recorded peak of the month.
<br>
Now taking a look at the Peak CCV of the whole platform : 


<img src ="PeakViewersTwitch.png">



<br> <br> 

<br>

So sponsors and advertisers be aware, sice this may not be a season pattern caused by Holidays! 


<br>


# Games Section - Vanilla Wow Analysis ! ( Bonus ) 

<br>

From this section one can derive various insights, mainly the **popularity of a game not only in viewership** but also with the number of **content creators**! I'm gonna analyse the recent resurface of Wow!  

<br> 

In 2018 fortnine has dominated the streaming scene! 

<br> 


<img src ="Top102018.png">

<br>

The sheer magnitude of Avg Channels,and Airtime isn't comparable to any other game! There was so much broadcasted content and creators, that Fortnite was hands down the most popular game **not only in viewership** but also with **content creators**!   


<br>

In **2019, fortnine still dominates** the scene,ranking at number one every month! But something **changed** this past month, something that **fans from all over the world have been craving for years and years**! (me included)   


<br>

Taking a look at the overall statistics of the month of August: 

<br>


<img src ="games_august2019.png">

<br>


<br>
<br>

Fortnine still dominates, however **Wow that ranked # 13 in July** suddenly **jumped to # 3** !  

<br>


<br>

**Aug 19th   to   Aug 25th - Week 34, 2019** Wow was still ranked at **# 11** , with a **share of 2.5%**!!

<br>


<img src ="Wow#11.png">




<br>

**What happend?** 

<br>


On **26th of August** the greatly anticipated release of the official **Vanilla Wow Servers** happened! <br>

Was this a massive disruption on twitch ? **YES**, take a look at next week's data **26th August to 1st September - Week 35, 2019** !

<br>

<img src ="26-1Sept.png">

<br>

Despite Fortnine having a similar amount of broadcast time,and content creators,**WoW dominated the viewership reach**, with a staggering **difference of 36 Million Hours watched** over Fortnine, and reaching a **22.4% share** over the entire platform!  

<br>

This past week, recorded over **31 million hours watched** , take notice that in **July fortnine clocked in 88 Million Hours** of content watched! Just this past two weeks **Wow clocked in 83 Million Hours** ! 
Probably due to the **hype built over the years** for this release, even I an Ex-Wow player, have been dreaming of this day for the past 5-6 years! <br>
<br>
Vanilla Wow was one of the most important online multiplayer game releases, my theory of this great shift in viewership in the past 2 weeks, is the **nostalgia factor** but is it enough to steal Fortnine's thunder?    

<br> 
<br> 

##  Does this pose a real threath or it's just hype ? 

<br> 

<br>

One of Wow biggest problem, is the **dead months** before the release of new content,right now it's hot since it was released two weeks ago, my prediction is that **in time fortnine** will start to **resurface as the biggest player in streaming scene again**,not only that but **it also appeals to a younger audience of new gamers**, while **Vanilla Wow main demographic is older**. <br>

Only time will tell, but I don't think that **Vanilla Wow** has the power to retain massive viewership for a long period of time! (despite being awsome!)

Wow will be divided into 4 stages of release of content, so it'd be safe to predict that during these releases of "new old content", we should see a shift in viewership on twitch!   


<br> 
<br> 

***


# 2)

<br>
<br>

*"Propose a new section for the BI that offers a different perspective.
   Assume that all the metrics present in the BI (game genres, publishers, channels, tournaments,chat, etc.) are available for all the platforms and date ranges.<br>What new insights could a business extract from this new section?"*


<br><br>


A new section that I think would be interesting possible to be integrated right now in the BI with the data available, would be a **viewer dedicated section**. <br>
<br>
One of the **main goals of streaming** is delivering content to the maximum amount of viewers possible, maximizing the reach, whether it would be as a **personal brand** (streamers name) or a **sponsor brand**. <br>
**Knowing your audience** looks like an important step in order to maximize your stream and brand value. This section could answer a **series of  questions** like:


<br>
<br>
<br>

* **What's the average twitch's viewer sentiment towards a specific brand,stream or game? based on chat rooms** 

<br>

* **How many active users on the platform?** 

<br>

* **Whats the average length of stay of a viewer on twitch of a specific channel or game?**


<br>

* **What's the percentage of total viwers by game genre**  

<br>

* **Whats the busiest time of the day, year week and month?**  

<br>


* **Top 10 words/expressions used in chatrooms** 

<br>

* **Top 10 languages used by the audience in chat rooms(English, Spanish)** 



<br> <br>




The data is already on the platform,**there seems to be a section for everybody , games , publisher, channels , brands** a section that seems to be missing here is the **audience** section. <br>


<br>

I think this is a section with huge potential and interest for all parties envolved in the platform!.


# An extra feature on channel's searched.


<br>

**Channel's chat level of sentiment** - In the **search** section, we can find the **chat** , there we can search only the channel and we'll have all of the messages that were displayed by user range input. 

<br>


**Message Reach** -

<img src ="chat.png">


<br> <br>

**The percentage between the overall message views of the session's chat with the total of viewers on the channel**


<br> <br>


****

<br> <br>

## 3) Looking at the metrics that are available in the BI, think of a dataset(s) that you would use to apply Machine Learning to extract new information.
### Explain what techniques you would use and how the new information would be valuable.<br><br>






There is an intereting oportunity to apply Machine Learning Techniques on the **Rankings Data Set** specially for Twitch, while analyzing Twitch data, I found a interesting **seasonal pattern** for the month of January, also , we have a rich data set for twitch counting with 7 years of data with monthly **data granularity** for all years, and daily for this year. <br>
Here could be interesting in trying to apply different **Time Series models**, if done correctly we could even be able to **predict platform peaks** for instance, which would prove usefull for users who want to maximizer their viewership reach!    <br>



Ideas -


* Predict Air Time based on previous same content streams, type of game , day of the week.   

* Predict 

<br>



