# Data Engineer Strean Hatchet Technical Test 
<br>

Hello Stream Hatchet Crew! I hope you enjoy this: 



<br>


****

## 1. You are given the following SQL tables:

<br>

a) streamers: it contains time series data, at a 1-min granularity, of all the channels that broadcast on
Twitch. The columns of the table are:

<br>

 * username: Channel username
 * timestamp: Epoch timestamp, in seconds, corresponding to the moment the data was captured
 * game: Name of the game that the user was playing at that time
 * viewers: Number of concurrent viewers that the user had at that time
 * followers: Number of total followers that the channel had at that time

<br>

b) games_metadata: it contains information of all the games that have ever been broadcasted on Twitch.
The columns of the table are:

<br>

* game: Name of the game
* release_date: Timestamp, in seconds, corresponding to the date when the game was released
* publisher: Publisher of the game
* genre: Genre of the game

<br>


I am using a DBeaver Sample DataBase in order to see my results! <br>
I created both tables as following:

<br>

```mysql

CREATE TABLE `streamers` (
  `username` varchar(64) NOT NULL,
  `timestamp` datetime NOT NULL,
  `game` varchar(32) NOT NULL,
  `viewers` integer NOT NULL,
  `followers` integer NOT NULL
  
);



CREATE TABLE `games_metadata`(

    `game` varchar(64) NOT NULL,
    `release_date` datetime NOT NULL, 
    `publisher` varchar(64) NOT NULL, 
    `genre` varchar(64) 

);



```










## Write an SQL query to:

<br>

#### 1. Obtain, for each month of 2018, how many streamers broadcasted on Twitch and how many hours of content were broadcasted. The output should contain **month**, **unique_streamers** and **hours_broadcast**.


<br>


```mysql 

SELECT  strftime('%m',`timestamp`) AS months ,COUNT(DISTINCT username)AS `unique_streamers`, COUNT( strftime('%M',`timestamp`))/(60*1.0) as hours_broadcast
FROM streamers where strftime('%Y',`timestamp`) = '2018'
GROUP BY months 

```

<br>

<br>

So first we select the month with the strftime function for the month display (and to later aggregate the data by month), since we only want the months of 2018, we specify the timestamp year for 2018 in the **FROM** statement. <br>
We use **COUNT (DISTINCT username)** in order to obtain the total number of different streamers that will be aggregated by the months column we created beforehand.<br>     
The data is captured on a per minute basis, duplicated timestamps are valid sicne you'll most likely have multiple streams at the same time.<br>
My approach was to count all rows ( duplicate included.The datetime format doesn't matter since this is a time series with 1 minute granularity), and divide it by 60 to get the number of hours.       



<br>
<br>
<br>










#### 2. Obtain the Top 10 streamers that have percentually gained more followers during January 2019, and that primarily stream FPS games. The output should contain the **username** and **follower_growth**.

<br>


```mysql

SELECT username, ((MAX(followers)*1.0-MIN(followers)*1.0)/MIN(followers)*1.0) AS follower_growth FROM (SELECT username,followers, genre, "timestamp" FROM (SELECT *
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game) 
WHERE strftime('%Y',"timestamp") = '2019' and strftime('%m',"timestamp") = '01' and genre = 'FPS')   
GROUP BY username
Order by follower_growth DESC
LIMIT 10 


```

The first thing we need to do is a inner join table to dintiguish FPS from non FPS games.<br>

<br>

**SELECT *
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game)**


<br>

Now we do a subquery on the the table we just "created", where we select what we need: **username** to later display and also to group by ,**followers** to calculate the growth , **genre** to use as a condition for FPS games,**timestamp** to filter only Jan of 2019.


<br>

With this "newly created table" ( it's not a table it's only a query, but we can think of it as a table because we are gonna query from a query), and we use:

<br>

**WHERE strftime('%Y',"timestamp") = '2019' and strftime('%m',"timestamp") = '01' and genre = 'FPS')**

<br>

To filter Jan 2019 and FPS games 

<br>

**SELECT username, ((MAX(followers)*1.0-MIN(followers)*1.0)/MIN(followers)*1.0) AS follower_growth**

<br>

To calculate growth we used the formula above ( multiplication by 1.0 to typecast to decimal) 

<br>


**GROUP BY username Order by follower_growth DESC LIMIT 10**

<br>

and ofcourse we need to aggregate and order the data as requested.


<br>
<br>
<br>

#### 3. Obtain the Top 10 publishers that have been watched the most during the first quarter of 2019. The output should contain publisher and hours_watched.

<br>

##### Note: Hours watched can be defined as the total amount of hours watched by all the viewers combined. Ie: 10 viewers watching for 2 hours will generate 20 Hours Watched.


<br>
<br>

```mysql

SELECT publisher, (cast(strftime('%m', "timestamp") as integer) + 2) / 3 as quarter, COUNT((strftime('%M',`timestamp`)/(60*1.0)) * viewers) as total_hours_watch
FROM streamers AS A INNER JOIN games_metadata AS B ON A.game = B.game 
WHERE quarter = 1
GROUP BY publisher 
ORDER BY total_hours_watch DESC
LIMIT 10 ;

```

<br>
<br>
<br>
<br>

****


# 2.

<br>


*Imagine a new streaming platform has recently launched. They provide an API endpoint that allows
third-parties to obtain, at any given time, the list of all the channels broadcasting in the platform, how
many concurrent viewers each channel has, what game is each channel playing, etc.<br>
At Stream Hatchet we want to capture that information and offer it to our clients through our web app,
providing rankings of top-performing streamers and games for each day, week, month, etc. <br>
Explain, in detail, how would you design and implement a system that is able to achieve that. From the
data gathering to serving the information to the web app so that the end user can consume it, detail
how you would implement each step, focusing on scalability and reliability.<br>
Describe what specific technologies, frameworks, and tools you would use and how you would deploy
the system on a cloud-native infrastructure.*




<br>
<br> 

In this question apart from detailing how I would implement the system, I'm gonna go over some theory about API's.


### What is an API ?

<br>

A quick wikipedia search leads us [here](https://en.wikipedia.org/wiki/Application_programming_interface): <br>



*An application programming interface (API) is an interface or communication protocol between a client and a server intended to simplify the building of client-side software. It has been described as a “contract” between the client and the server, such that if the client makes a request in a specific format, it will always get a response in a specific format or initiate a defined action.*

<br>

There exists a famous analogy to explain API's. Imagine you are in a sitting in a restaurant,how do you fill your apetite?<br>
Most cases you have a Menu to choose from, in practice you don't really know how each plate is made and honestly you don't really care either, you are just hungry and you want to eat a meal that contains preferably ingreedients to which you are not allergic. <br>
An API is the messenger(menu) that takes requests(orders) and tells the system what to do (which plate to cook).<br>

<br>

APIs are hosted on web servers. When you type www.google.com in your browser’s address bar, your computer is actually asking the www.google.com server for a webpage, which it then returns to your browser.<br>
APIs work much the same way, except instead of your web browser asking for a webpage, your program asks for data.


<br>



## Data gathering 

<br>

For data gathering I would use python programming language with  the famous requests library. <br>

There are many different types of requests. The most commonly used one, a GET request, is used to retrieve data.(Which is what we want). <br> <br>

Here I present a brief tutorial of how I would implement it, by getting data from the ISS(International Space Station),the way one would implement for a streaming platform would be very similar. 



https://datarebellion.com/blog/easily-build-and-deploy-your-first-python-web-app/

https://coderbook.com/@marcus/how-scalable-are-websites-built-in-django-framework/

https://www.freecodecamp.org/news/what-is-an-api-in-english-please-b880a3214a82/

https://www.howtogeek.com/343877/what-is-an-api/

https://www.youtube.com/watch?v=tI8ijLpZaHk

https://www.dataquest.io/blog/python-api-tutorial/

<br>

### Simple GET request to retrieve information from the OpenNotify API.

<br>

In [1]:
import requests 


response = requests.get("http://api.open-notify.org/iss-now.json")

# Print the status code of the response.
print(response.status_code)

200


This means we are connected to the API.

<br>

Each status code means something: <br>



* 200 — everything went okay, and the result has been returned (if any)
* 301 — the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
* 401 — the server thinks you’re not authenticated. This happens when you don’t send the right credentials to access an API.
* 400 — the server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
* 403 — the resource you’re trying to access is forbidden — you don’t have the right permissions to see it.
* 404 — the resource you tried to access wasn’t found on the server.



In [2]:
response = requests.get("http://api.open-notify.org/iss-pass.json")
print(response.status_code)

400


Here we see that the server thinks you made a bad request, as stated above it probably means that you are not sending the right data along with the request!<br>
If you look into the [API Documentation](http://open-notify.org/Open-Notify-API/), you'll see that the ISS-PASS endpoint requires two paramenters!<br>
<br>
The ISS Pass endpoint returns when the ISS will next pass over a given location on earth, to do this you need ofcourse the lat and long of your chosen location!<br>
<br>
You can input the parameters directly into the URL as follows http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74 or setup the parameters as a dictionary!
<br>
Since we are in Barcelona(41.3851° N, 2.1734° E) let's see when the ISS will hoover over us!  



In [3]:
# Set up the parameters we want to pass to the API.
# This is the latitude and longitude of New York City.
parameters = {"lat": 41.3851, "lon": 2.1734}
# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)
# Print the content of the response (the data the server returned)
display(response.content.decode("utf-8"))


print("\n \n \n")



type(response.content.decode("utf-8"))


'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1568206691, \n    "latitude": 41.3851, \n    "longitude": 2.1734, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 608, \n      "risetime": 1568238441\n    }, \n    {\n      "duration": 638, \n      "risetime": 1568244221\n    }, \n    {\n      "duration": 577, \n      "risetime": 1568250095\n    }, \n    {\n      "duration": 584, \n      "risetime": 1568255953\n    }, \n    {\n      "duration": 646, \n      "risetime": 1568261761\n    }\n  ]\n}\n'


 
 



str

We can see that the server returns us with a string.<br>

<br>

Strings are the way that we pass information back and forth to APIs, but it’s hard to get the information we want out of them.<br>
There's a much better way of getting data and it's trought json files.<br><br>
 JSON is a way to encode data structures like lists and dictionaries to strings that ensures that they are easily readable by machines, JSON is the primary format in which data is passed back and forth to APIs, and most API servers will send their responses in JSON format.<br><br>
Python supports JSON trough an inbuilt module called json.<br><br>
 
The json module converts lists and dics to JSON, and strings to lists and dictionaries,in order to do this the module has 2 main methods:

  * **dumps** — Takes in a Python object, and converts it to a string.
  * **loads** — Takes a JSON string, and converts it to a Python object.


<br>
<br>
 
### Getting JSON from an API request 
You can get the content of a response as a python object by using the .json() method on the response.
<br>

In [4]:
# Make the same request we did earlier, but with the coordinates of San Francisco instead.
parameters = {"lat":41.3851, "lon": 2.1734}
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)
# Get the response data as a python object. Verify that it's a dictionary.
data = response.json()
print(type(data))
print(data)

<class 'dict'>
{'message': 'success', 'request': {'altitude': 100, 'datetime': 1568206691, 'latitude': 41.3851, 'longitude': 2.1734, 'passes': 5}, 'response': [{'duration': 608, 'risetime': 1568238441}, {'duration': 638, 'risetime': 1568244221}, {'duration': 577, 'risetime': 1568250095}, {'duration': 584, 'risetime': 1568255953}, {'duration': 646, 'risetime': 1568261761}]}


In [5]:
print(data['request']['altitude'])

100


### Content type 
<br>

We can also access the response metadata, that contains information on how the data was generated and how to decode it,this metadata is stored in the response headers, we can access it through the headers method.
<br>

The headers method returns a dictionary,the most relevant key-pair for extracting data is the 'Content-Type', since it tells you which type of data the server returns to you.(In this case is a Json file)



In [6]:
# Headers is a dictionary
print(response.headers)
# Get the content-type from the dictionary.
print(response.headers["content-type"])

{'Server': 'nginx/1.10.3', 'Date': 'Wed, 11 Sep 2019 12:58:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '522', 'Connection': 'keep-alive', 'Via': '1.1 vegur'}
application/json


<br>

### Stream Hatchet Streaming use case 

<br>
<br>

Well if we'd be getting data from the streaming service throught an API, the process will be very similar to what we did above! <br>

<br>

   1. Read the API documenation and see which parameters(if any)  would be necessary to be inputted.
   2. Get request from the streaming service API. 
   3. Look into the response metadata first, to see the data type.(most likely would be JSON)
   4. And finnaly get the data response with the json method(if it's a json file), and save it.
   
   
<br>

### Deployment 

<br>

Since we want to deploy on a cloud native infrastructure focusing on scalability and reliability,I'd use Docker for containerization, and Kubernetes as container-orchestration tool.<br>

So I would create a Docker Container to run our system,make a Docker Image  after, our system is ready to be deployed and managed with Kubernetes.


### What is Docker and Kubernetes? 

<br>

Docker is a standalone software that can be installed on any computer to run containerized applications. Containerization is an approach of running applications on an OS such that the application is isolated from the rest of the system. You create an illusion for your application that it is getting its very own OS instance, although there may be other containers running on same system. Docker is what enables us to run, create and manage containers on a single operating system.<br>

<br>

Kubernetes turns it up to 11, so to speak. If you have Docker installed on a bunch of hosts (different operating systems), you can leverage Kubernetes. These nodes, or Docker hosts, can be bare-metal servers or virtual machines. Kubernetes can then allow you to automate container provisioning, networking, load-balancing, security and scaling across all these nodes from a single command line or dashboard. A collection of nodes that is managed by a single Kubernetes instance is referred to as a Kubernetes cluster.


<br>

**Why Kubernetes Solution?**

<br>

   1. To make the infrastructure more robust: Application will be online, even if some of the nodes go offline, i.e, Reliability 
   2. To make application more scalable: If workload increases, simply spawn more containers and/or add more nodes to your Kubernetes cluster.

<br>

Kubernetes works with Amazon EC2, Azure Container Service, Rackspace, GCE, IBM Software, and other clouds. And it works with bare-metal (using something like CoreOS), Docker, and vSphere. And it works with libvirt and KVM, which are Linux machines turned into hypervisors (i.e, a platform to run virtual machines). <br>
This way you don't need to be stuck with a specific cloud vendor. 

  
  
<br>

<br>

https://thenewstack.io/cloud-native-apps-need-to-be-managed-in-a-completely-new-way/   

https://medium.com/better-practices/deploying-a-scalable-web-application-with-docker-and-kubernetes-a5000a06c4e9

https://kubernetes.io/

https://containerjournal.com/topics/container-ecosystems/kubernetes-vs-docker-a-primer/

http://www.developintelligence.com/blog/2017/02/kubernetes-actually-use/

https://www.scalyr.com/blog/create-docker-image/


***
# 3.

*A 4-year-old is trying to build a tub for his goldfish out of Lego. Every Lego piece is stuck to the piece to
its left and its right (except for the first and last one). All the pieces have a width of 1 unit.
<br>Write a program, using the programming language of your choice, that given the heights (in units) of the
lego pieces from left to right, outputs the total amount of water held over the pieces that the kid built.*


<br>


## My approach: <br> <br>

After some hours of experimentation,I managed to divise an Algorithm that is able to achieve what we want.


We first go through our list of numbers normally and and create a list with the the same lenght of the previous, every element contains the maximum number found so far ie 
<br><br>
Imagine we have the following configuration: 3,1,1,1,2
<br><br>
Right -> Left we'd have the following list : 3,3,3,3,3
<br><br>
Then we'd do the same from Left to Right:
<br><br>
Left -> Right we'd have the following list  : 3,2,2,2,2
<br><br>
We'd get the minimum per element of these two lists:
<br><br>
MinList: 3,2,2,2,2
<br><br>
Then we subtract the Min list with the original configuration:
<br><br>
Result = 0,1,1,1,0
<br><br>
Now we we only need to sum the elements of the list and we have the total volume. 




### How it works 
<br>

We first go trough the whole list to see which at any time is the highest block, and do the same backwards,why ? We want the "valleys", and we can have multiple peaks at the same time, so that's why we need to go R-L and then L-R! 
<br><br>
Next we have to get the minimum element of both lists, why? Imagine a config like 4,2,3, using our algorithm, we'd get 4,4,4 and 4,3,3, if we subtract   




In [7]:
def max_min_rl_lr(*args):
    """
    Given a number of values specified by the user
    returns the a list with the biggest elements updated from right to left, 
    and from left to right.
    
    ie - input [3,1,3,4]
    
    outputs : rl = [3,3,3,4]
            : lr = [4,4,4,4]
    
    :param *args: int  
    :return: rl , lr , lists 
     
    """
    rl = list() # list of maximum right to left 
    lr = list() # list of maximum left to right 
    big_rl = None # holder for max value
    big_lr = None # holder for max value
    og =  list(args)
    for w in args: # Loops through the arguments
       
        # condition if holder is 0,(beggning of the loop) appends the first value 
        # or if current element is bigger than holder, holder becomes the new value  
        if big_rl is None or w > big_rl: big_rl = w 
            
        rl.append(big_rl) # appends the 
            
    
    # The same as before but with the input inverted 
    
    for w in list(reversed(args)): # list(reversed(args)) , reverses the order of the list, reading left to right
    
        if big_lr is None or w > big_lr: big_lr = w
            
        lr.append(big_lr)
    
    # We need to revert the list again so it's in the "orignal format"
    
    return rl ,list(reversed(lr)) , og 
    
            
            
    

In [13]:
def draw_tank(original):
    """
    Draws the original configuration 
    
    :params original: list 
    
    :returns: void
    
    """
    print("\n Original Configuration is : \n")
    
    
    
    for row in range(max(original), 0, -1): # Loops from max of the list to 0, so to get row 3 , row 2, row 1.
        print(' '.join(['#' if height >= row else ' ' for height in original])) # prints horizontally a symbol if the height (element of list) => row number  
                                                                     # else prints an empty space
    
   


In [14]:
def min_of_two_lists(l1,l2):
    """
    Given two lists, iterates through both in paralel and stores the minimum
    into a new list.
    
    :param l1: list
    :param l2: list
    :return: list
      
    """
    # list comprenhension, loops through both lists in paralel with the zip function,
    # and stores the minimum element per iteration     
    
    return [min(j) for j in zip(l1,l2)]


    
    

In [15]:
def total_water(original,minimum):
    """
    Calculates the difference the newly created list that holds the minimum of the right to left and left to right peaks list
    with the orignal list of the configuration specified of the user, and calculates the volume of water.
    
    :param original: list
    :param minimum: list
    :return: list
    
    """
    import numpy as np

    
    sub_list =list(np.subtract(minimum,original))
    
    return(np.sum(sub_list),sub_list)


In [16]:
rl , lr, og =  max_min_rl_lr(3,1,1,2)

min_two =  min_of_two_lists(rl,lr)

draw_tank(og)


print("\nOriginal configuration: {}".format(og))
print("\n\nRight to left peaks: {}".format(rl))
print("\nLeft to right peaks: {}".format(lr))



print("\n\nMinimum of {} and {} : \n\n {}".format(rl,lr,min_two))

water,dif = total_water(og,min_two)

print("\n\n The total amount of water held by the configuration is : \n \n {}".format(water))





 Original Configuration is : 

#      
#     #
# # # #

Original configuration: [3, 1, 1, 2]


Right to left peaks: [3, 3, 3, 3]

Left to right peaks: [3, 2, 2, 2]


Minimum of [3, 3, 3, 3] and [3, 2, 2, 2] : 

 [3, 2, 2, 2]


 The total amount of water held by the configuration is : 
 
 2


<br>

# 4 

<br>

Take a look at Stream Hatchet’s BI <br><br>



## 1)  Focusing on one or two sections of your choice, explain what insights you can extract from the data that is being represented.  

<br><br>

I decided to take a look at the ranking sections more specifically at the different streaming platforms available!
<br><br>
Let's take a look at the metrics:

<br>

* **Concurrent Views**: The active audience during a specific time frame of a livestream. Displays the size of an audience viewing the content. 
<br>

* **Average Concurrent Viewers Platform** - Refers to the average audience size throughout a livestream. Provides insight into prolonged interest in the content.
<br>

* **Air time** - Amount of stream time of the platform.

<br>

* **Hours Watched** - Total amount of Hours watched across the platform! 

<br>

* **Unique Channels** - The number of active channels broadcasting their own content. Does not include channels hosting other channels. Displays the total number of users on a streaming platform given set parameters.

<br>

* **Average Concurrent Viewers Channel** - Average Concurrent Viewers per Channels. 

<br>

* **Peak Concurrent Viewers Channel** -Represents the maximum number of concurrent viewers during a livestream. Provides insight into the most popular moment of a livestream event, and thus the most engaging.

<br> 

* **Average Channels Platform** - The averahe number of Channels broadcasting in the platform.

<br>

* **Peak Concurrent Viewers Platform** - Represents the maximum number of concurrent viewers of the entire platform. Provides insight into the most popular moments the platform, and thus the most engaging.

<br>

* **Peak Channels Platform** - Maximum number of  Channels broadcasting in the platform.  

<br><br>


<br>
<br>
<br>

<img src="average_concurrent_platform.png">

<br>
<br>

We can see the sheer enourmous impact that twitch has on the streaming scene, aswell as it's consistence growth over the years.
It's no surprise Twitch dominates the scene, as it started years before it's main competitors, building a strong brand over the years.<br><br>

This has led to having over **1 million more average concurrent vierwes** over it's main competitor Youtube, this is an abysmal difference, we can also note that in 1 year and 9 months,  **Twtich** has **doubled** it's avg cucurrent viewers from **double of youtube's best record (~400k)** to a whooping **1.33570** recorded this August. <br> <br>
One interesting season pattern found present over the years of twitch's existence(specially since 2015) seen in **average concurrent viewers** is the **growth** in the **month of January**!This is highly noticable **in 2018** where it had it's **biggest growth ever recorded!**  <br>

<br>

My first thought was that most **publishers**,in order to maximize their sales,tend to **publish games during Christmas holidays**.However I didn't find   data to support this hypothesis.<br>

<br>

<br>

<br>



<img src="table.png">


<br>
<br>



Looking at the performance of the channel ELEAGUE of the month of January 2018, confirms this. 


<br>
<br>

<br>







<img src ="ELEAGUES.png">




<br>
<br>

<br> <br> 




**My question**: Is ELEAGUES Majors tournament causing a peak CCV in the whole platform, or is this a seasonal holiday pattern? 
<br><br>
<img src ="PeakJan.png">


<br>

The **ELEAGUES CS MAJORS FINALS** took place on Jan 29th, which also corresponds to the highest recorded peak of the month.
<br>
Now taking a look at the Peak CCV of the whole platform : 


<img src ="PeakViewersTwitch.png">



<br> <br> 

We are now looking at Peak CCV which is different than the Avg CCV, but they are derived from the CCV, and Peak values can skew the average up!  

<br>

So sponsors and advertisers be aware, sice this may not be a season pattern caused by Holidays! 


<br>


## Other insights : 

Is twitch the best for starting gaming stream a career ?  <br>

<br>

Let's see how many unique channels we have in each different platform <br>
<br>

<img src ="UniqueChannels.png">


<br>

Twitch is still king,however bewary of Microsoft's Mixers, July this year has recorded more Unique channels on Mixer than on Twitch, and it's inline to eat some of Twitch's market share!


<br><br>


Despite displaying a rapid and steady growth of content creators (stream), this overflow of Streamers is causing CCV's per channel to be lower than 1 viewer! This means that most of the content being streamed actually has no1 viewing them.

<br>

<img src ="avgCCV_channel.png">



<br>

While twitch is without any doubt the clear winner in relation to CCV's in the entire platform,things change when we start looking at CCV's and Peaks per Channel.

<br>

While Facebook leads on the AVG CCV per channel,because it has less content creators that fight over th viewership inside the platform. 


<br>
 
<img src ="PeakCCVChannel.png">
 
 
<br>

Despite twitch dominance, when we talk about peak values, youtube seems to be on par with it , meaning that the most popular channel on twitch has almost the same views as the most popular on Youtube Gaming.


## 2) Propose a new section for the BI that offers a different perspective.
### Assume that all the metrics present in the BI (game genres, publishers, channels, tournaments,chat, etc.) are available for all the platforms and date ranges.<br>What new insights could a business extract from this new section?<br><br>



## 3) Looking at the metrics that are available in the BI, think of a dataset(s) that you would use to apply Machine Learning to extract new information.
### Explain what techniques you would use and how the new information would be valuable.<br><br>
