# **WEB SCRAPING API**

**Kelas 2024A**

Rebriane Atitha G. (039)

Nabilah Hilmi R. (053)

Kekila Akmal N. (144)

Web Scraping dengan API:
1. Identifikasi Situs Web atau Layanan yang Menyediakan API
2. Dapatkan Kunci API (API Key)
3. Buat Permintaan ke API
4. Urai (Parse) Data Respons
5. Simpan Data

**Inisialisasi dan Impor Pustaka**

In [None]:
import requests
import json
import pandas as pd
import time
api_key = '41984B7715C8A57676000E24258DC3E3'

**Mengambil Daftar Aplikasi**

In [None]:
get_app_list_url = "http://api.steampowered.com/ISteamApps/GetAppList/v2/"
response = requests.get(get_app_list_url)
data = response.json()
app_list = data["applist"]["apps"]
print(f"Total apps: {len(app_list)}")

Total apps: 267537


**Menyimpan Daftar Aplikasi ke File JSON**

In [None]:
with open("response.json", "w", encoding="utf-8") as f:
    json.dump(app_list, f, indent=4, ensure_ascii=False)


**Mengambil Detail Setiap Game**

In [None]:
with open("response.json", "r", encoding="utf-8") as f:
    data = json.load(f)

games_data = []
app_details_url = "http://store.steampowered.com/api/appdetails"

for app in data[:100]:
    appid = app["appid"]
    name = app["name"]

    detail_url = f"{app_details_url}?appids={appid}&l=english"
    try:
        res = requests.get(detail_url, timeout=10)
        if res.headers.get("Content-Type","").startswith("application/json"):
            app_info = res.json().get(str(appid), {})
        else:
            app_info = {}

        if app_info.get("success") and "data" in app_info:
            game = app_info["data"]
            genres = ", ".join([g["description"] for g in game.get("genres", [])])
            price = game.get("price_overview", {}).get("final_formatted", "Gratis")
        else:
            genres, price = "", "Unknown"

        games_data.append([appid, name, genres, price])

    except Exception as e:
        print(f"Error {appid}: {e}")
        games_data.append([appid, name, "", "Error"])

    time.sleep(0.3)


**Menyimpan Data ke File CSV**

In [None]:
import pandas as pd
df = pd.DataFrame(
    games_data,
    columns=['AppID', 'Name', 'Genres', 'Price']
)
display(df)
df.to_csv("steam_games_sample.csv", index=False, encoding="utf-8")
print("\nData berhasil disimpan ke file steam_games_sample.csv")

Unnamed: 0,AppID,Name,Genres,Price
0,5,Dedicated Server,,Unknown
1,7,Steam Client,,Unknown
2,8,winui2,,Unknown
3,10,Counter-Strike,Action,$9.99
4,20,Team Fortress Classic,Action,$4.99
...,...,...,...,...
95,903,Darwinia Trailer,,Unknown
96,904,Half-Life 2 Trailer,,Unknown
97,905,Half-Life 2: Episode One Trailer,,Unknown
98,906,Rag Doll Kung Fu Trailer,,Unknown



Data berhasil disimpan ke file steam_games_sample.csv


**Text Pre-processing**

Lower Casing Name Game

In [42]:
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

try:
    df
except NameError:
    df = pd.read_csv("steam_games_sample.csv")

df["Clean_Name"] = df["Name"].str.lower()

tfidf = TfidfVectorizer(max_features=500)  # ambil max 500 kata penting
X_name_tfidf = tfidf.fit_transform(df["Clean_Name"])

print("Shape TF-IDF judul:", X_name_tfidf.shape)
print("Top 10 kata dari judul game:", tfidf.get_feature_names_out()[:10])

Shape TF-IDF judul: (100, 130)
Top 10 kata dari judul game: ['2006' '2007' '2011' '2012' 'achievements' 'add' 'alien' 'antenna'
 'artwork' 'authoring']


In [43]:
df["Lower_Name"] = df["Name"].str.lower()
df["Clean_Name"] = (
    df["Name"]
    .str.lower()
    .str.replace(r"[^a-z0-9\s]", " ", regex=True)  # hapus simbol
    .str.replace(r"\s+", " ", regex=True)          # rapikan spasi
    .str.strip()
)

print(df[["Name", "Lower_Name"]].sample(100, random_state=42))


                                  Name                          Lower_Name
83                  Steam Achievements                  steam achievements
53          Left 4 Dead 2 Preorder DLC          left 4 dead 2 preorder dlc
70         Portal 2 - Gamestop PS3 DLC         portal 2 - gamestop ps3 dlc
45         Left 4 Dead Authoring Tools         left 4 dead authoring tools
44        Left 4 Dead Dedicated Server        left 4 dead dedicated server
..                                 ...                                 ...
60  Dota 2 - Inflatable Donkey Courier  dota 2 - inflatable donkey courier
71        Portal 2 - Bot Paint Job DLC        portal 2 - bot paint job dlc
14               Half-Life: Blue Shift               half-life: blue shift
92                        Zombie Movie                        zombie movie
51       Left 4 Dead 2 Authoring Tools       left 4 dead 2 authoring tools

[100 rows x 2 columns]


Remove Frequent Word

In [44]:
import pandas as pd

word_freq_df = pd.DataFrame(word_freq.items(), columns=["Word", "Frequency"])
word_freq_df = word_freq_df.sort_values(by="Frequency", ascending=False)

frequent_df = word_freq_df.head(20)

print("Frequent Words:")
print(frequent_df)


Frequent Words:
          Word  Frequency
34           2         41
14        half         14
15        life         14
45      portal         13
29      source         12
53         dlc         11
58        left         11
59           4         11
60        dead         11
0    dedicated         10
1       server         10
2        steam          9
67        dota          9
5      counter          9
6       strike          9
7         team          7
8     fortress          7
124    trailer          6
30         sdk          5
35        demo          4


Remove Rare Word

In [47]:
import pandas as pd

word_freq_df = pd.DataFrame(word_freq.items(), columns=["Word", "Frequency"])
word_freq_df = word_freq_df.sort_values(by="Frequency", ascending=False)

rare_df = word_freq_df[word_freq_df["Frequency"] == 1].head(100)

print("\nRare Words:")
print(rare_df)



Rare Words:
          Word  Frequency
32        2006          1
23     deleted          1
51   community          1
48         two          1
46       first          1
..         ...        ...
127       doll          1
128       kung          1
129         fu          1
130        red          1
131  orchestra          1

[82 rows x 2 columns]


Removal Symbol from Price

In [46]:
df["Clean_Price"] = (
    df["Price"]
    .str.replace(r"[^\d]", "", regex=True)  # ambil angka
    .replace("", "0")                       # ganti kosong → 0
    .astype(float)                          # ubah ke float
)
print(df[["Clean_Price"]])

    Clean_Price
0           0.0
1           0.0
2           0.0
3         999.0
4         499.0
..          ...
95          0.0
96          0.0
97          0.0
98          0.0
99          0.0

[100 rows x 1 columns]
