<a href="https://colab.research.google.com/github/BoraGitHubble/30-Days-of-Data-Engineering/blob/master/Copy_of_Case_Study_Newzoo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Case Study Newzoo**

##**Introduction**
Newzoo collects data about games on different platforms. One of those platforms is the Epic Games Store. A stakeholder at the company came to use and wanted to know more about the games that are displayed on the Epic Games Store Storefront. The Epic Games Store Storefront is the first thing users will see when they open the Epic Games Store. Here is a link to the Epic Games Store Storefront if you want to see what that looks like: https://store.epicgames.com/en-US/.

##**Overview**
You want to inform and answer questions that the stakeholder has. You will be doing that through the following steps.
- Making requests to the Epic Games Store API to get the data that you need.
- Clean and wrangle the data that you get from the Epic Games Store API to make it better readable and only has the necessary data.
- Analyze the data and answer stakeholder questions the stakeholder has.


###**Data Acquisition**

####**Scraping Part 1 - Store Front Endpoint**

As mentioned in the introduction , we want explore the Epic Games Storefront and for this exercise we would specifically want to gather data for `Most Played` games on the Epic Store.

This a two step process hence we have two scraping parts to tackle.
The first part fetches the game name, offer_id and sandbox_id from the store front endpoint which will be used in the next part of scraping using the catalog endpoint.

Scraping multiple endpoints is an important step to gather information from different sources and combining them to get a comprehensive view of data

In [None]:
# There are two imports that we have to do here
import requests
import pandas as pd

# This is the storefront URL that you will request to get the storefront data
URL = "https://store-site-backend-static-ipv4.ak.epicgames.com/storefrontLayout"

# These are the parameters that you send with the request
params = {
    "locale": "en-US",
    "country": "NL"
}

# Fill in the URL and parameters for the request
response = requests.get(URL, params=params)

# Check the response code and make sure that the response is OK
if response.ok:

    # Get the storefront modules key from the Storefront data
    storefront_modules = response.json().get("data", {}).get("Storefront", {}).get("storefrontModules", [])

    # Make empty list called "offers_list"
    offers_list = []
    # Loop over the storefront modules and find the module-top-lists
    for storefront_module in storefront_modules:
        if "module-top-lists" in storefront_module.get("id"):
            # Loop over the modules in the storefront_module and find the Most Played key
            for module in storefront_module.get("modules"):
                if "Most Played" in module.get("title"):
                    # Loop over the offers and fetch title, id, namespace
                    # Save this in dictionary "offers_dict"
                    for offer in module.get("offers"):
                        offers_dict = {
                            "game_name": offer.get("offer", {}).get("title"),
                            "offer_id": offer.get("id"),
                            "sandbox_id": offer.get("namespace")
                        }
                        # append the offers_dict to the offers_list
                        offers_list.append(offers_dict)

    # put the offers_list into a dataframe named df_storefront
    df_storefront = pd.DataFrame(offers_list)
else:
    print(f"Error in requesting the Epic Games Store API, status code is {response.status_code}")

# put the dataframe here to view
df_storefront

Unnamed: 0,game_name,offer_id,sandbox_id
0,Fortnite,09176f4ff7564bbbb499bbe20bd6348f,fn
1,Rocket League®,02d44be4c21c4ce094c6151133c91482,9773aa1aa54f4f7b80e44bef04986cea
2,Grand Theft Auto V: Premium Edition,954871df36d3456ca1face43aa5c2e62,0584d2013f0149a791e7b9bad0eec102
3,VALORANT,abcd18dfe32b41cb86332f745c73569c,cbd5b3d310a54b12bf3fe8c41994174f
4,Genshin Impact,acc319019e974ec9a4af28530141d888,879b0d8776ab46a59a129983ba78f0ce
5,Fall Guys,07ce78560aa34180936b199202274462,50118b7f954e450f8823df1614b24e80
6,Honkai: Star Rail,0728f2df169d4abca07e83a452b1ef6c,a2dcbb9e34204bda9da8415f97b3f4ea
7,NARAKA: BLADEPOINT,bdd8a627c9914901a37edf6347c6b49e,0c6aee83b9b64372bf44a043001325f2
8,Football Manager 2023,4435c9c948034b338c92b2a13a9bd993,5c7a78e0c4d640898d690c5e38c0392f
9,Bloons TD 6,b27e3b556f1048b9824c7196f32afceb,6a8dfa6e441e4f2f9048a98776c6077d


####**Scraping Part 2 - Catalog Endpoint**


Now that we have our lists of IDs and namespaces, we want to find our a some more information about each of these titles. At Newzoo, we're interested in a variety of information about games, ranging from prices to genres (and much more). Some of the information is included in our storefront JSON file, but deeper metadata surrounding each of the titles is unvailable.

One way to do this would be to visit the page for each of the games and take note of the different metadata for each title. To get an idea of the different information available, here is the link to the page for Fall Guys: https://store.epicgames.com/en-US/p/fall-guys

A quick look at this pages shows us a lot of interesting information. Genres, features, ratings and more. It's a lot to take in. Luckily, Epic Games have an endpoint running where they catalog information about their games. Here's the link for Fall Guys, this time using the aforementioned endpoint:

https://store.epicgames.com/graphql?operationName=getCatalogOffer&variables={"locale":"en-US","country":"NL","offerId":"07ce78560aa34180936b199202274462","sandboxId":"50118b7f954e450f8823df1614b24e80"}&extensions={"persistedQuery":{"version":1,"sha256Hash":"6797fe39bfac0e6ea1c5fce0ecbff58684157595fee77e446b4254ec45ee2dcb"}}

As you can see, this request is a bit more complex than the request for the storefront. It's made up of different elements, which we've broken in the code block below.


In [None]:
# Different elements of the catalog endpoint request
base_request_url = "https://store.epicgames.com/graphql"

operation_name = "getCatalogOffer"

variables = {
    "locale": "en-US",
    "country": "NL",
    "offerId": "07ce78560aa34180936b199202274462",
    "sandboxId": "50118b7f954e450f8823df1614b24e80"
}

extensions = {
    "persistedQuery":
        {
            "version": 1,
            "sha256Hash":"6797fe39bfac0e6ea1c5fce0ecbff58684157595fee77e446b4254ec45ee2dcb"
        }
}

headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
}

As you can see, these elements are the same as the ones in the URL we provided above, we've just simply broken them down into their different groups.

We've also provided some headers. We do this so that Epic recognizes us as a browser rather than a python script running in a jupyter notebook. This means we're more likely to get data back.

We can use a combination of the above to make a request to this endpoint with python, like so.

In [None]:
# Request data using above example
import json

params = {
    "operationName": operation_name,
    "variables": json.dumps(variables),
    "extensions": json.dumps(extensions)
}

response = requests.get(url = base_request_url, params = params, headers = headers)
response.status_code

200

Here, our requests.get is more complex than our first request. We build a parameters dictionary that includes our operationName, variables and extensions from above. We make sure to use json.dumps on our variables and extensions so that Epic Games sees the both as json strings.

The response is a JSON containing all relevant data about one of these games, in this case, Fall Guys.

**How can we use this call and our list of game ids and namespaces to capture all of the metadata for our top played list?**


In [None]:
import json

data_list = []

# fill in the base_request_url for request
base_request_url = "https://store.epicgames.com/graphql"

# fill in the operation_name for request
operation_name = "getCatalogOffer"

variables = {
    "locale": "en-US",
    "country": "NL",
}

# fill in the extensions for the request
extensions = {
    "persistedQuery":
        {
            "version": 1,
            "sha256Hash":"6797fe39bfac0e6ea1c5fce0ecbff58684157595fee77e446b4254ec45ee2dcb"
        }
}

# fill in the headers for the request
headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/113.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
}

# Use the Dataframe that you created in the last code block
for index, row in df_storefront.iterrows():

  # Fetch offerId and sandboxId from the Dataframe and store them in the following variables
  variables["offerId"] = row["offer_id"]
  variables["sandboxId"] = row["sandbox_id"]

  # fill in the parameters for the request
  params = {
      "operationName": operation_name,
      "variables": json.dumps(variables),
      "extensions": json.dumps(extensions)
  }

  # fill in the url, params and headers for the request
  response = requests.get(
      url=base_request_url,
      params=params,
      headers=headers
  )

  # Check the response code and make sure that the response is OK
  if response.ok:

    # Append the response json to data_list
    data_list.append(response.json())

    # print the offer_id
    print(f"Data captured for id: {row['offer_id']}")
  else:
    # print offer_id and status_code when the request fail
    print(f"Error capturing data for id: {row['offer_id']}, status code is {response.status_code}")

Data captured for id: 09176f4ff7564bbbb499bbe20bd6348f
Data captured for id: 02d44be4c21c4ce094c6151133c91482
Data captured for id: 954871df36d3456ca1face43aa5c2e62
Data captured for id: abcd18dfe32b41cb86332f745c73569c
Data captured for id: acc319019e974ec9a4af28530141d888
Data captured for id: 07ce78560aa34180936b199202274462
Data captured for id: 0728f2df169d4abca07e83a452b1ef6c
Data captured for id: bdd8a627c9914901a37edf6347c6b49e
Data captured for id: 4435c9c948034b338c92b2a13a9bd993
Data captured for id: b27e3b556f1048b9824c7196f32afceb
Data captured for id: fe74b3dad04846e5a58f62aebd3858b6
Data captured for id: 014f265f264f46e6b5d59c738cf24ee4
Data captured for id: 76ec3e438cab4064b8e1de921ee4755a
Data captured for id: 268fd6ea355740d6ba4c76c3ffd4cbe0
Data captured for id: 1126e6bcac0549b0bda00be4a1f69327


###**Data Wrangling**

In [None]:


# Create empty list named data_extraction_list
data_extraction_list = []

# Loop over the data_list from the last code block
for i, game_data in enumerate(data_list):
  # Create empty dictionary named data_dict
  data_dict = {}
  # Fetch the data from game_data
  data = game_data.get("data", {}).get("Catalog", {}).get("catalogOffer")
  # Fetch the data from game_data
  data = game_data.get("data", {})
  # Fetch Catalog from data
  catalog = data.get("Catalog", {})
  # Fetch catalogOffer from catalog
  catalog_offer = catalog.get("catalogOffer")
  # Get the ranking
  data_dict["ranking"] = i + 1
  # Get the id
  data_dict["offer_id"] = catalog_offer.get("id")
  # Get the offerType
  data_dict["offer_type"] = catalog_offer.get("offerType")
  # Get the releaseDate
  data_dict["release_date"] = catalog_offer.get("releaseDate")
  # The next lines of code are more complex than rest of the exercise
  # So feel free to ask if you are struggling

  # Get the discountPrice
  data_dict["discount_price"] = catalog_offer.get("price", {}).get("totalPrice", {}).get("discountPrice")
  # Get the originalPrice
  data_dict["original_price"] = catalog_offer.get("price", {}).get("totalPrice", {}).get("originalPrice")
  # Get the genress
  data_dict["genres"] = [tag.get("name") for tag in catalog_offer.get("tags",[]) if tag.get("groupName") == "genre"]
  # Append the data_dict to data_extraction_list
  data_extraction_list.append(data_dict)

# Save the data_extraction_list as dataframe named df_catalog
df_catalog = pd.DataFrame(data_extraction_list)

# Merge df_storefront and df_catalog dataframes together on offer_id
df = df_storefront.merge(df_catalog, left_on='offer_id', right_on='offer_id')

# Put the dataframe here to view
df


Unnamed: 0,game_name,offer_id,sandbox_id,ranking,offer_type,release_date,discount_price,original_price,genres
0,Fortnite,09176f4ff7564bbbb499bbe20bd6348f,fn,1,OTHERS,2017-07-21T09:00:00.000Z,0,0,"[Action, Shooter]"
1,Rocket League®,02d44be4c21c4ce094c6151133c91482,9773aa1aa54f4f7b80e44bef04986cea,2,BASE_GAME,2020-09-23T15:00:00.000Z,0,0,[Racing]
2,Grand Theft Auto V: Premium Edition,954871df36d3456ca1face43aa5c2e62,0584d2013f0149a791e7b9bad0eec102,3,BASE_GAME,2020-05-14T15:00:00.000Z,1499,2999,"[Action, Adventure]"
3,VALORANT,abcd18dfe32b41cb86332f745c73569c,cbd5b3d310a54b12bf3fe8c41994174f,4,BASE_GAME,2021-11-04T16:30:00.000Z,0,0,"[Action, Shooter]"
4,Genshin Impact,acc319019e974ec9a4af28530141d888,879b0d8776ab46a59a129983ba78f0ce,5,BASE_GAME,2021-06-08T23:00:00.000Z,0,0,"[Fantasy, RPG, Open World, Adventure]"
5,Fall Guys,07ce78560aa34180936b199202274462,50118b7f954e450f8823df1614b24e80,6,BASE_GAME,2022-06-21T08:00:00.000Z,0,0,[Party]
6,Honkai: Star Rail,0728f2df169d4abca07e83a452b1ef6c,a2dcbb9e34204bda9da8415f97b3f4ea,7,BASE_GAME,2023-04-26T00:30:00.000Z,0,0,"[RPG, Adventure]"
7,NARAKA: BLADEPOINT,bdd8a627c9914901a37edf6347c6b49e,0c6aee83b9b64372bf44a043001325f2,8,BASE_GAME,2021-12-08T16:00:00.000Z,0,0,[Action]
8,Football Manager 2023,4435c9c948034b338c92b2a13a9bd993,5c7a78e0c4d640898d690c5e38c0392f,9,BASE_GAME,2022-11-08T00:00:00.000Z,5999,5999,[Simulation]
9,Bloons TD 6,b27e3b556f1048b9824c7196f32afceb,6a8dfa6e441e4f2f9048a98776c6077d,10,BASE_GAME,2022-07-19T12:00:00.000Z,659,1099,[Strategy]


###**Data Validation**

1.   How many games are discounted in the most played games list?
2.   Which games in the most played games are action games?
3.   What is the most recently released game in the most played games list?




In [None]:
#Answer 1
df["discounted"] = df["discount_price"] - df["original_price"]
print(df['discounted'][df["discounted"] < 0].count())


#Answer 2
game_list = []
for index, row in df.iterrows():
  if "Action" in row["genres"]:
    game_list.append(row["game_name"])
print(game_list)


#Answer 3
print(df.sort_values(by=["release_date"], ascending=False).reset_index()["release_date"][0])



4
['Fortnite', 'Grand Theft Auto V: Premium Edition', 'VALORANT', 'NARAKA: BLADEPOINT', 'Dying Light Enhanced Edition']
2023-09-14T15:00:00.000Z


Thank you for your participation! This is the end of the Case Study, we hope you liked it and learned something today!