# Projet Steam

### La plateforme de jeux vidéo de Steam 


#### Description de l'entreprise 
Steam est un service de distribution numérique de jeux vidéo et une boutique en ligne de Valve . Lancé en septembre 2003, il automatise les mises à jour des jeux Valve et s'étend à la distribution de titres tiers fin 2005. Steam propose diverses fonctionnalités, telles que la gestion des droits numériques (DRM), la mise en relation des serveurs de jeu avec les mesures anti-triche de Valve , les réseaux sociaux et le streaming de jeux . Parmi les fonctionnalités du client Steam figurent l'automatisation des mises à jour, le stockage cloud de la progression et des fonctionnalités communautaires telles que la messagerie directe, les fonctions de superposition en jeu et un marché virtuel d'objets de collection .

#### Projet 
Vous travaillez chez Ubisoft , un éditeur français de jeux vidéo. Ils souhaitent sortir un nouveau jeu vidéo révolutionnaire ! Ils vous ont demandé de réaliser une analyse globale des jeux disponibles sur la marketplace Steam afin de mieux comprendre l'écosystème du jeu vidéo et les tendances actuelles.

####Objectifs 
L'objectif ultime de ce projet est de comprendre les facteurs qui influencent la popularité ou les ventes d'un jeu vidéo. Votre patron vous a demandé de profiter de cette occasion pour analyser le marché mondial du jeu vidéo.

In [0]:
from pyspark.sql import functions as F
from pyspark.sql.functions import *
from pyspark.sql.types import ArrayType, StructType, StringType, BooleanType



In [0]:
filepath = "s3://full-stack-bigdata-datasets/Big_Data/Project_Steam/steam_game_output.json"

In [0]:
steam_df = (spark.read.format('json')
           .load(filepath))

In [0]:
print(steam_df.columns)


['data', 'id']


In [0]:
num_rows = steam_df.count()
num_columns = len(steam_df.columns)

print(f"Row = {num_rows}, Column = {num_columns}")

Row = 55691, Column = 2


# Visualisation des colonnes imbriquées

In [0]:
steam_df.printSchema()

root
 |-- data: struct (nullable = true)
 |    |-- appid: long (nullable = true)
 |    |-- categories: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- ccu: long (nullable = true)
 |    |-- developer: string (nullable = true)
 |    |-- discount: string (nullable = true)
 |    |-- genre: string (nullable = true)
 |    |-- header_image: string (nullable = true)
 |    |-- initialprice: string (nullable = true)
 |    |-- languages: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- negative: long (nullable = true)
 |    |-- owners: string (nullable = true)
 |    |-- platforms: struct (nullable = true)
 |    |    |-- linux: boolean (nullable = true)
 |    |    |-- mac: boolean (nullable = true)
 |    |    |-- windows: boolean (nullable = true)
 |    |-- positive: long (nullable = true)
 |    |-- price: string (nullable = true)
 |    |-- publisher: string (nullable = true)
 |    |-- release_date: string (nullable = true)
 |    |-

# Transformation df structure imbriquée 'Data'et 'platforms' en un df colonnes "plates".


In [0]:
steam_df_flat = steam_df.select("data.*", "data.platforms.*")
steam_df_flat = steam_df_flat.drop("platforms")
steam_df_flat.display()

appid,categories,ccu,developer,discount,genre,header_image,initialprice,languages,name,negative,owners,positive,price,publisher,release_date,required_age,short_description,tags,type,website,linux,mac,windows
10,"List(Multi-player, Valve Anti-Cheat enabled, Online PvP, Shared/Split Screen PvP, PvP)",13990,Valve,0,Action,https://cdn.akamai.steamstatic.com/steam/apps/10/header.jpg?t=1666823513,999,"English, French, German, Italian, Spanish - Spain, Simplified Chinese, Traditional Chinese, Korean",Counter-Strike,5199,"10,000,000 .. 20,000,000",201215,999,Valve,2000/11/1,0,Play the world's number 1 online action game. Engage in an incredibly realistic brand of terrorist warfare in this wildly popular team-based game. Ally with teammates to complete strategic missions. Take out enemy sites. Rescue hostages. Your role affects your team's success. Your team's success affects your role.,"List(266, 1191, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 5426, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 227, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 2784, null, null, null, null, null, null, null, null, null, null, null, null, 1607, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 4831, null, null, null, null, null, null, null, null, null, 1707, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 632, null, null, null, null, null, null, null, null, null, null, null, 3392, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 131, null, null, 769, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 881, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 289, null, null, null, 3353, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 614, null, null, null, null, null, null, 304, null, null, null, 1344, null, null, 1864, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 1192)",game,,True,True,True
1000000,"List(Single-player, Partial Controller Support, Steam Achievements, Steam Cloud)",0,IndigoBlue Game Studio,0,"Action, Adventure, Indie",https://cdn.akamai.steamstatic.com/steam/apps/1000000/header.jpg?t=1655723048,999,"English, Korean, Simplified Chinese",ASCENXION,5,"0 .. 20,000",27,999,PsychoFlux Entertainment,2021/05/14,0,"ASCENXION is a 2D shoot 'em up game where you explore the field to progress. Players must overcome puzzles, traps, elite units, boss fights, and other various obstacles while navigating the field. Grow stronger through rewards earned, to uncover the truth of this world.","List(null, null, null, 159, null, null, null, null, null, null, null, null, null, null, null, null, 111, 138, null, null, null, null, null, 73, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 88, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 179, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 124, null, null, null, null, null, null, 148, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 161, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 51, null, null, null, null, null, null, null, 38, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 69, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 181, null, null, null, 136, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 100, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 186, 159, null, null, 175, null, null, 71, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 170, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,,False,False,True
1000010,"List(Single-player, Partial Controller Support, Steam Achievements, Steam Cloud, Steam Trading Cards)",99,NEXT Studios,70,"Adventure, Indie, RPG, Strategy",https://cdn.akamai.steamstatic.com/steam/apps/1000010/header.jpg?t=1655724189,1999,"Simplified Chinese, English, Japanese, Traditional Chinese, French, German, Spanish - Spain, Russian, Portuguese - Brazil",Crown Trick,646,"200,000 .. 500,000",4032,599,"Team17, NEXT Studios",2020/10/16,0,"Enter a labyrinth that moves as you move, where mastering the elements is key to defeating enemies and uncovering the mysteries of this underground world. With a new experience awaiting every time you enter the dungeon, let the power bestowed by the crown guide you in this challenging adventure!","List(null, null, null, 205, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 179, null, 225, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 189, null, null, null, null, null, null, null, 225, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 179, null, null, null, null, 217, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 171, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 231, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 237, null, null, null, null, null, null, null, null, null, null, 192, null, null, null, null, null, 268, 226, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 211, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 225, null, null, null, null, null, null, null, null, null, null, 184, null, null, null, null, null, null, null, null, null, null, null, null, null, 178, null, null, null, null, null, null, null, null, null, null, null, 222, 254, 216, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,,False,False,True
1000030,"List(Multi-player, Single-player, Co-op, Steam Achievements, Steam Cloud, Shared/Split Screen, Full controller support, Steam Trading Cards, Shared/Split Screen Co-op, Remote Play on Phone, Remote Play on Tablet, Remote Play on TV, Remote Play Together)",76,Vertigo Gaming Inc.,0,"Action, Indie, Simulation, Strategy",https://cdn.akamai.steamstatic.com/steam/apps/1000030/header.jpg?t=1660866300,1999,English,"Cook, Serve, Delicious! 3?!",115,"100,000 .. 200,000",1575,1999,Vertigo Gaming Inc.,2020/10/14,0,"Cook, serve and manage your food truck as you dish out hundreds of different foods across war-torn America in this massive sequel to the million-selling series!","List(null, null, null, 187, null, null, null, null, null, null, null, null, null, null, null, null, null, 175, null, null, null, null, null, null, null, null, null, null, null, null, null, 200, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 209, null, null, null, null, null, null, null, null, null, null, null, 175, 123, null, null, null, null, null, null, null, 176, null, null, null, null, null, 119, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 208, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 175, null, null, null, 120, null, null, null, null, null, null, null, null, null, 184, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 163, 158, null, null, null, null, null, null, null, null, null, 213, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 157, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 182, 134, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 190, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 221, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,http://www.cookservedelicious.com,False,True,True
1000040,List(Single-player),0,DoubleC Games,0,"Action, Casual, Indie, Simulation",https://cdn.akamai.steamstatic.com/steam/apps/1000040/header.jpg?t=1627033870,199,Simplified Chinese,细胞战争,1,"0 .. 20,000",0,199,DoubleC Games,2019/03/30,0,这是一款打击感十足的细胞主题游戏！操作简单但活下去却不简单，“你”作为侵入人体的细菌病毒，通过与细胞之间的战斗来获得基因变异点数和进入下一关的资格，每种细菌病毒都有独特的能力和攻击效果，你是否可以破坏五大器官并占领人体呢！？,"List(null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 22, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 22, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 21, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 20, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,,False,False,True
1000080,"List(Multi-player, Single-player, Steam Achievements, Full controller support, Steam Trading Cards)",3,IndieLeague Studio,60,"Action, Adventure, Indie, RPG",https://cdn.akamai.steamstatic.com/steam/apps/1000080/header.jpg?t=1667062553,1999,"Simplified Chinese, English, Traditional Chinese, Japanese, Korean",Zengeon,462,"100,000 .. 200,000",1018,799,2P Games,2019/06/24,0,Zengeon is an anime infused Action RPG and Roguelite with a selection of unique characters and varying play-styles. Slaughter your way through demonic hordes and colossal bosses with hundreds of combination and skill possibilities!,"List(null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 138, 77, null, 111, null, null, 115, null, null, null, null, null, null, 84, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 31, null, null, null, null, null, null, null, null, null, null, null, 34, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 38, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 36, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 57, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 128, null, null, null, null, null, null, 30, null, null, null, null, null, null, null, null, null, 51, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 62, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 38, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 121, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 53, 67, null, null, null, null, null, null, null, null, null, null, null, null, null, 33, null, null, null, null, null, null, null, 46, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,,False,True,True
1000100,"List(Single-player, Steam Achievements, Steam Cloud)",0,七月九日,0,"Adventure, Indie, RPG, Strategy",https://cdn.akamai.steamstatic.com/steam/apps/1000100/header.jpg?t=1561522270,1299,"Japanese, Simplified Chinese, Traditional Chinese",干支セトラ　陽ノ卷｜干支etc.　陽之卷,6,"0 .. 20,000",18,1299,Starship Studio,2019/01/24,0,耐用年数を超えて綻びゆく都市風水を修復するため、次代の風水師候補に選ばれた主人公。 綻びから襲い来る妖異を討ち祓い、半年後の認定試験に挑むため与えられたのは、十二支――陰陽五行の化身である１２人の男達だった。 それは、400年に渡る首都繁栄の終焉。,"List(null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 20, null, null, null, null, null, null, 11, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 10, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 20, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 20, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 20, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,http://0709.noor.jp/etc,False,False,True
1000110,"List(Multi-player, Single-player, Co-op, Online PvP, Online Co-op, PvP)",0,重庆环游者网络科技,0,"Action, Adventure, Casual, Free to Play, Massively Multiplayer",https://cdn.akamai.steamstatic.com/steam/apps/1000110/header.jpg?t=1562917106,0,"English, Simplified Chinese, Traditional Chinese",Jumping Master(跳跳大咖),34,"20,000 .. 50,000",50,0,重庆环游者网络科技,2019/04/8,0,Jumping Master is a innovative casual competitive game fully merged with classic gameplay of mushroom game. It aims to help us revisit the classic mushroom game in our chlidhood.,"List(null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 24, null, null, null, null, null, 23, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 24, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 26, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 25, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,http://www.huanyz.com/bzjj/,False,False,True
1000130,"List(Single-player, Steam Achievements, Steam Leaderboards)",0,Simon Codrington,0,"Casual, Indie",https://cdn.akamai.steamstatic.com/steam/apps/1000130/header.jpg?t=1646024900,299,English,Cube Defender,0,"0 .. 20,000",6,299,Simon Codrington,2019/01/6,0,Build turrets and destroy wave after wave of cube enemies in this minimalist and addictive tower defence game. Battle it out with a bunch of cube enemies and blast them off the map with a range of unique turrets.,"List(null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 10, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 31, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 10, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 31, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 11, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 10, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 16, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null)",game,,False,True,True
1000280,List(Single-player),0,Villain Role,0,"Indie, RPG",https://cdn.akamai.steamstatic.com/steam/apps/1000280/header.jpg?t=1649211613,1399,"English, Simplified Chinese, Traditional Chinese",Tower of Origin2-Worm's Nest,12,"0 .. 20,000",32,1399,Villain Role,2021/09/9,0,"As the protagonist，the Balrog Princess— HongYe, after helping humans assess the rebellion. She thought she couldhave a stable life, but unexpectedly fell into a bigger crisis. What kind of ♂hardships ♂ will she experience this time, and how ♂interesting experiences♂ she will encounter.","List(null, null, null, 101, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 86, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 129, null, null, null, 81, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 84, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 146, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 70, null, null, null, null, null, null, 137, null, null, null, null, null, 141, null, null, null, null, null, null, null, null, null, null, null, null, 129, null, null, null, null, null, null, null, null, 150, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 89, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 107, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 155, null, null, null, null, null, null, null, 77, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 92, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 91, 112, null, null, null, null, 95, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, 122, null, null, null, null, null)",game,https://weibo.com/u/7623414897,False,False,True


In [0]:
steam_df_flat.describe().toPandas()

Unnamed: 0,summary,appid,ccu,developer,discount,genre,header_image,initialprice,languages,name,negative,owners,positive,price,publisher,release_date,required_age,short_description,type,website
0,count,55691.0,55691.0,55691,55691.0,55691,55691,55691.0,55691,55691,55691.0,55691,55691.0,55691.0,55691,55691,55691,55691,55691,55691
1,mean,1025603.0926720656,138.9596164550825,67392.0,2.603777989262179,,,797.5663033524268,,Infinity,241.8376937027527,,1470.8755992889337,773.2849832109317,2001.0,,0.1978882344490734,,,
2,stddev,522784.968328345,6002.067909130765,210681.70504552333,12.887080174743176,,,1104.762477841338,,,5765.413761559615,,30982.733479534887,1093.13458272345,1921.8937275510318,,2.2962924614818236,,,
3,min,10.0,0.0,,0.0,,https://cdn.akamai.steamstatic.com/steam/apps/...,0.0,,Fieldrunners 2,0.0,"0 .. 20,000",0.0,0.0,,,0,,game,
4,max,2190950.0,874053.0,＼上／,90.0,Web Publishing,https://cdn.akamai.steamstatic.com/steam/apps/...,9999.0,Turkish,～Daydream～蝶が舞う頃に,908515.0,"500,000 .. 1,000,000",5943345.0,9999.0,Ｌｅｍｏｎ　Ｂａｌｍ,2022/11/7,MA 15+,🚗 Take part in a roller coaster of emotions wi...,hardware,www.windybeard.com


In [0]:
steam_df_flat.printSchema()

root
 |-- appid: long (nullable = true)
 |-- categories: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- ccu: long (nullable = true)
 |-- developer: string (nullable = true)
 |-- discount: string (nullable = true)
 |-- genre: string (nullable = true)
 |-- header_image: string (nullable = true)
 |-- initialprice: string (nullable = true)
 |-- languages: string (nullable = true)
 |-- name: string (nullable = true)
 |-- negative: long (nullable = true)
 |-- owners: string (nullable = true)
 |-- positive: long (nullable = true)
 |-- price: string (nullable = true)
 |-- publisher: string (nullable = true)
 |-- release_date: string (nullable = true)
 |-- required_age: string (nullable = true)
 |-- short_description: string (nullable = true)
 |-- tags: struct (nullable = true)
 |    |-- 1980s: long (nullable = true)
 |    |-- 1990's: long (nullable = true)
 |    |-- 2.5D: long (nullable = true)
 |    |-- 2D: long (nullable = true)
 |    |-- 2D Fighter: long (nulla

In [0]:
num_rows = steam_df_flat.count()
num_columns = len(steam_df_flat.columns)

print(f"Row = {num_rows}, Column = {num_columns}")

Row = 55691, Column = 24


# Analyse colonne "release_date"et "price"

In [0]:
# Vérification des dates pas identique
steam_df_flat.select("release_date").distinct().show(10, truncate=False)


+------------+
|release_date|
+------------+
|2020/10/16  |
|2019/03/30  |
|2019/04/8   |
|2019/12/17  |
|2020/10/14  |
|2000/11/1   |
|2019/01/24  |
|2019/06/24  |
|2019/01/6   |
|2021/05/14  |
+------------+
only showing top 10 rows



In [0]:
steam_df_flat.select("release_date") \
    .withColumn("len", length("release_date")) \
    .groupBy("len") \
    .count() \
    .orderBy("len") \
    .show()

+---+-----+
|len|count|
+---+-----+
|  0|   99|
|  7|  123|
|  9|15732|
| 10|39737|
+---+-----+



### Nous avons 99 dates manquantes et on s'apercoit que les dates ne sont pas toutes au même format(exp : 7 chiffres correspond à 2019/03/4 au lieu de 2019/03/04). Nous devons donc retraiter les dates sous un format yyyy/MM/dd.

## Nettoyages des données "release_date"

In [0]:
# valeurs vides
clean_df_steam = steam_df_flat.withColumn(
    "release_date_clean",
    when(col("release_date").isin("", "None", "NaN"), None).otherwise(col("release_date"))
)


In [0]:
clean_df_steam = clean_df_steam \
    .withColumn("release_date_clean", F.to_timestamp(F.col("release_date"), format="y/M/d")) \
    .withColumn("price_int", F.col("price").cast("int"))


In [0]:
clean_df_steam.select(F.col("release_date_clean")).show(10)

+-------------------+
| release_date_clean|
+-------------------+
|2000-11-01 00:00:00|
|2021-05-14 00:00:00|
|2020-10-16 00:00:00|
|2020-10-14 00:00:00|
|2019-03-30 00:00:00|
|2019-06-24 00:00:00|
|2019-01-24 00:00:00|
|2019-04-08 00:00:00|
|2019-01-06 00:00:00|
|2021-09-09 00:00:00|
+-------------------+
only showing top 10 rows



In [0]:
clean_df_steam = clean_df_steam.drop("release_date", "price")


In [0]:
clean_df_steam.printSchema()

root
 |-- appid: long (nullable = true)
 |-- categories: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- ccu: long (nullable = true)
 |-- developer: string (nullable = true)
 |-- discount: string (nullable = true)
 |-- genre: string (nullable = true)
 |-- header_image: string (nullable = true)
 |-- initialprice: string (nullable = true)
 |-- languages: string (nullable = true)
 |-- name: string (nullable = true)
 |-- negative: long (nullable = true)
 |-- owners: string (nullable = true)
 |-- positive: long (nullable = true)
 |-- publisher: string (nullable = true)
 |-- required_age: string (nullable = true)
 |-- short_description: string (nullable = true)
 |-- tags: struct (nullable = true)
 |    |-- 1980s: long (nullable = true)
 |    |-- 1990's: long (nullable = true)
 |    |-- 2.5D: long (nullable = true)
 |    |-- 2D: long (nullable = true)
 |    |-- 2D Fighter: long (nullable = true)
 |    |-- 2D Platformer: long (nullable = true)
 |    |-- 360 Video: 

# Analyse Globale Valeurs manquantes et Doublons  

In [0]:
# verification lignes et colonnes ok vs steam_df_flat
num_rows = clean_df_steam.count()
num_columns = len(clean_df_steam.columns)

print(f"Row = {num_rows}, Column = {num_columns}")

Row = 55691, Column = 24


In [0]:
# valeurs manquantes
for c in clean_df_steam.columns:
  count_null_c = clean_df_steam.filter(clean_df_steam[c].isNull()).count()
  print(c, count_null_c)

appid 0
categories 0
ccu 0
developer 0
discount 0
genre 0
header_image 0
initialprice 0
languages 0
name 0
negative 0
owners 0
positive 0
publisher 0
required_age 0
short_description 0
tags 0
type 0
website 0
linux 0
mac 0
windows 0
release_date_clean 222
price_int 0


In [0]:
# Doublons
if clean_df_steam.count() == clean_df_steam.dropDuplicates().count():
    print("Pas de doublons")
else:
    print("Attention doublons")

Pas de doublons


In [0]:
# Analyse une cellule vide (""),"None" : donnée est absente,"NaN" : indique une donnée numérique invalide ou manquante.
empty_values = ["", "None", "NaN"]

for c in clean_df_steam.columns:
    col_type = clean_df_steam.schema[c].dataType
    
    if isinstance(col_type, (ArrayType, StructType)):
        print(f"Colonne {c} ignorée car elle est de type ArrayType ou StructType.")
        continue
    
    if isinstance(col_type, StringType):
        # Pour les colonnes string, on filtre selon empty_values
        count_empty_c = clean_df_steam.filter(F.col(c).isin(empty_values)).count()
    elif isinstance(col_type, BooleanType):
        # Pour les colonnes booléennes, on compte les valeurs nulles (car False et True sont valides)
        count_empty_c = clean_df_steam.filter(F.col(c).isNull()).count()
    else:
        # Pour les autres types (int, float...), on compte les valeurs nulles aussi
        count_empty_c = clean_df_steam.filter(F.col(c).isNull()).count()
    
    print(f"{c}: {count_empty_c}")


appid: 0
Colonne categories ignorée car elle est de type ArrayType ou StructType.
ccu: 0
developer: 128
discount: 0
genre: 161
header_image: 0
initialprice: 0
languages: 11
name: 0
negative: 0
owners: 0
positive: 0
publisher: 154
required_age: 0
short_description: 37
Colonne tags ignorée car elle est de type ArrayType ou StructType.
type: 0
website: 25217
linux: 0
mac: 0
windows: 0
release_date_clean: 222
price_int: 0


In [0]:
print("Nombre de valeurs null dans website:", clean_df_steam.filter(F.col("website").isNull()).count())
print("Nombre de valeurs vides (\"\") dans website:", clean_df_steam.filter(F.col("website") == "").count())
print("Nombre de valeurs 'None' dans website:", clean_df_steam.filter(F.col("website") == "None").count())
print("Nombre de valeurs 'NaN' dans website:", clean_df_steam.filter(F.col("website") == "NaN").count())


Nombre de valeurs null dans website: 0
Nombre de valeurs vides ("") dans website: 25217
Nombre de valeurs 'None' dans website: 0
Nombre de valeurs 'NaN' dans website: 0


### Les valeurs ["", "None", "NaN"] ne représentent pas un pourcentage significatif, c'est pourquoi j'ai choisi de ne pas les retraiter. Les websites ne sont pas renseignés à chaque fois, d'ou les 25217 vides.

# Sauvegarde du fichier clean

In [0]:
clean_df_steam.write.mode("overwrite").json("/FileStore/export/steam_clean")


![image.png](attachment:image.png)