https://store.steampowered.com/tags/en/Action/

1. Try extracting the names of the top games from this page.

2. What tags contain the prices? Can you extract the price information?

3. Get all of the header tags on the page.

4. Can you get the text from each span tag with class equal to "top_tag"?

5. Under the "Narrow by Tag" section, there are a collection of tags (e.g. "Indie", "Adventure", etc.). Write code to return these tags.

6. What else can be scraped from this webpage or others on the site?

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
base_url = 'https://store.steampowered.com/tags/en/Action/'

In [3]:
r = requests.get(base_url)
r

<Response [200]>

In [4]:
html = r.content

In [5]:
soup = BeautifulSoup(html, 'lxml')

In [6]:
with open('steamapps.html', 'wb') as steam_file:
    steam_file.write(soup.prettify('utf-8'))

In [11]:
new_releases = soup.find('div', {'id':'NewReleasesRows'}).find_all('a', class_='tab_item')

In [17]:
new_releases[0].find('div', class_='tab_item_name').text

'Streets of Rage 4'

### Extract Top Rated Games

By inspecting html, we can find that top rated games are under `div` tag with class `TopRatedRows`.

Under that, every game comes under `a` tag.

In [18]:
top_rated = soup.find('div', {'id':'TopRatedRows'}).find_all('a', class_='tab_item')

In [19]:
top_rated

[<a class="tab_item" data-ds-appid="730" data-ds-crtrids="[4]" data-ds-descids="[2,5]" data-ds-itemkey="App_730" data-ds-tagids="[1663,1774,3859,3878,19,5711,5055]" href="https://store.steampowered.com/app/730/CounterStrike_Global_Offensive/?snr=1_241_4_action_tab-TopRated" onmouseout="HideGameHover( this, event, 'global_hover' )" onmouseover="GameHover( this, event, 'global_hover', {&quot;type&quot;:&quot;app&quot;,&quot;id&quot;:730,&quot;params&quot;:{&quot;bDisableHover&quot;:false},&quot;public&quot;:1,&quot;v6&quot;:1} );">
 <div class="tab_item_cap">
 <img class="tab_item_cap_img" src="https://steamcdn-a.akamaihd.net/steam/apps/730/capsule_184x69.jpg?t=1585696022"/>
 </div>
 <div class="discount_block tab_item_discount no_discount" data-price-final="104900"><div class="discount_prices"><div class="discount_final_price">Free to Play</div></div></div> <div class="tab_item_content">
 <div class="tab_item_name">Counter-Strike: Global Offensive</div>
 <div class="tab_item_details">
 

In [20]:
len(top_rated)

15

In [21]:
top_rated_games = [game.find('div', class_='tab_item_name').text for game in top_rated]

In [22]:
top_rated_games

['Counter-Strike: Global Offensive',
 "Tom Clancy's Rainbow Six® Siege",
 'Warframe',
 'Left 4 Dead 2',
 'Counter-Strike',
 'Borderlands 2',
 'Tomb Raider',
 'PAYDAY 2',
 'Destiny 2',
 'Counter-Strike: Source',
 'Half-Life 2',
 'BioShock Infinite',
 'Risk of Rain 2',
 'Mount & Blade: Warband',
 'MONSTER HUNTER: WORLD']

### Price of game

In every `a` tag, price is given under `div` tag with class = `discount_final_price`

In [32]:
top_rated_prices = []
for game in top_rated:
    try:
        top_rated_prices.append(game.find('div', class_='discount_final_price').text)
        
    except AttributeError:
        top_rated_prices.append(None)
    

In [33]:
top_rated_prices

['Free to Play',
 '₹ 999',
 'Free to Play',
 '₹ 349',
 '₹ 349',
 '₹ 999',
 None,
 '₹ 349',
 'Free To Play',
 '₹ 349',
 '₹ 349',
 '₹ 1,499',
 '₹ 719',
 '₹ 699',
 '₹ 2,497']

### Get all of the header tags

There are only `h2` headers are present on page

In [38]:
h2_tags = soup.find_all('h2')

In [41]:
h2_headings = [tag.text.strip() for tag in h2_tags]

In [42]:
h2_headings

['Browsing Action',
 'Daily Deal',
 'Spotlight',
 'Weekend Deal',
 'Weekend Deal',
 'Weekend Deal',
 'Narrow By Tag',
 'Recommended Specials',
 'Games streaming now']

### Get the text from each span tag with class equal to "top_tag"

In [43]:
span_top_tags = soup.find_all('span', class_ = 'top_tag')
span_top_tags

[<span class="top_tag">Action</span>,
 <span class="top_tag">, Indie</span>,
 <span class="top_tag">, Beat 'em up</span>,
 <span class="top_tag">, Adventure</span>,
 <span class="top_tag">Adventure</span>,
 <span class="top_tag">, RPG</span>,
 <span class="top_tag">, Massively Multiplayer</span>,
 <span class="top_tag">, Action</span>,
 <span class="top_tag">Action</span>,
 <span class="top_tag">, Indie</span>,
 <span class="top_tag">, Adventure</span>,
 <span class="top_tag">, RPG</span>,
 <span class="top_tag">Action</span>,
 <span class="top_tag">, Indie</span>,
 <span class="top_tag">, Retro</span>,
 <span class="top_tag">, Old School</span>,
 <span class="top_tag">Indie</span>,
 <span class="top_tag">, Action</span>,
 <span class="top_tag">, Adventure</span>,
 <span class="top_tag">, Gore</span>,
 <span class="top_tag">Action</span>,
 <span class="top_tag">, Indie</span>,
 <span class="top_tag">, RPG</span>,
 <span class="top_tag">, Violent</span>,
 <span class="top_tag">Free to P

In [45]:
span_top_tags_text = [tag.text.strip(', ') for tag in span_top_tags]
span_top_tags_text

['Action',
 'Indie',
 "Beat 'em up",
 'Adventure',
 'Adventure',
 'RPG',
 'Massively Multiplayer',
 'Action',
 'Action',
 'Indie',
 'Adventure',
 'RPG',
 'Action',
 'Indie',
 'Retro',
 'Old School',
 'Indie',
 'Action',
 'Adventure',
 'Gore',
 'Action',
 'Indie',
 'RPG',
 'Violent',
 'Free to Play',
 'Massively Multiplayer',
 'Action',
 'Strategy',
 'Simulation',
 'Action',
 'Indie',
 'Adventure',
 'Casual',
 'Indie',
 'Action',
 'Simulation',
 'Action',
 'Adventure',
 'Indie',
 'Gore',
 'Massively Multiplayer',
 'Simulation',
 'Action',
 'VR',
 'RPG',
 'Action',
 'Online Co-Op',
 'Looter Shooter',
 'Action',
 'Zombies',
 'Horror',
 'Female Protagonist',
 'Action',
 'Anime',
 'Fighting',
 '2D Fighter',
 'Action',
 'Adventure',
 'Nudity',
 'Early Access',
 'Action',
 'Multiplayer',
 'Hunting',
 'Co-op',
 'Action',
 'Indie',
 "Beat 'em up",
 'Adventure',
 'RPG',
 'Action',
 'Online Co-Op',
 'Looter Shooter',
 'Co-op',
 'Multiplayer',
 'Action',
 'Open World',
 'Action',
 'Masterpiece',
 

### Under the "Narrow by Tag" section, there are a collection of tags (e.g. "Indie", "Adventure", etc.). Write code to return these tags.

It comes under `a` tag with class = `"btnv6_blue_hoverfade btn_small_tall"`

In [47]:
narrow_by_tags = soup.find_all('a', {'class':'btnv6_blue_hoverfade btn_small_tall'})
narrow_by_tags

[<a class="btnv6_blue_hoverfade btn_small_tall" href="https://store.steampowered.com/search/?tags=492,19&amp;snr=1_241_4_action_237">
 <span>
 <div class="tag_count_button">
 <span class="tag_name">Indie</span>
 <span class="tag_count tab_filter_control_count">15,008</span>
 </div>
 </span>
 </a>,
 <a class="btnv6_blue_hoverfade btn_small_tall" href="https://store.steampowered.com/search/?tags=21,19&amp;snr=1_241_4_action_237">
 <span>
 <div class="tag_count_button">
 <span class="tag_name">Adventure</span>
 <span class="tag_count tab_filter_control_count">8,152</span>
 </div>
 </span>
 </a>,
 <a class="btnv6_blue_hoverfade btn_small_tall" href="https://store.steampowered.com/search/?tags=597,19&amp;snr=1_241_4_action_237">
 <span>
 <div class="tag_count_button">
 <span class="tag_name">Casual</span>
 <span class="tag_count tab_filter_control_count">5,963</span>
 </div>
 </span>
 </a>,
 <a class="btnv6_blue_hoverfade btn_small_tall" href="https://store.steampowered.com/search/?tags=418

In [52]:
narrow_by_text = [tag.find('span', class_ = 'tag_name').text
                  for tag in narrow_by_tags]
narrow_by_text

['Indie',
 'Adventure',
 'Casual',
 'Singleplayer',
 'RPG',
 'Early Access',
 'Simulation',
 'Strategy',
 'Multiplayer',
 '2D',
 'Violent',
 'Shooter']

### Extra scraping

We can extract the game tag related to our top games list.

In [57]:
top_rated_game_tags = [game.find('div', class_ = 'tab_item_top_tags').text
                       for game in top_rated]
top_rated_game_tags

['FPS, Shooter, Multiplayer, Competitive',
 'FPS, Hero Shooter, Multiplayer, Tactical',
 'Free to Play, Looter Shooter, Action, Co-op',
 'Zombies, Co-op, FPS, Multiplayer',
 'Action, FPS, Multiplayer, Shooter',
 'Loot, Shooter, Action, Multiplayer',
 'Adventure, Action, Female Protagonist, Third Person',
 'Co-op, Action, FPS, Heist',
 'Free to Play, Looter Shooter, FPS, Multiplayer',
 'Shooter, Action, FPS, Multiplayer',
 'FPS, Action, Sci-fi, Classic',
 'FPS, Story Rich, Action, Singleplayer',
 'Third-Person Shooter, Action Roguelike, Action, Multiplayer',
 'Medieval, RPG, Open World, Strategy',
 'Co-op, Multiplayer, Action, Open World']

### Create namedtuple and dataframe for top rated games attributes

top_rated_games

top_rated_prices

top_rated_game_tags

In [58]:
from collections import namedtuple
import pandas as pd

In [62]:
top_games = namedtuple('TopGames','Name Price Category')

In [60]:
for data in zip(top_rated_games, top_rated_prices, top_rated_game_tags):
    print(top_games(*data))

Top_games(Name='Counter-Strike: Global Offensive', Price='Free to Play', Category='FPS, Shooter, Multiplayer, Competitive')
Top_games(Name="Tom Clancy's Rainbow Six® Siege", Price='₹ 999', Category='FPS, Hero Shooter, Multiplayer, Tactical')
Top_games(Name='Warframe', Price='Free to Play', Category='Free to Play, Looter Shooter, Action, Co-op')
Top_games(Name='Left 4 Dead 2', Price='₹ 349', Category='Zombies, Co-op, FPS, Multiplayer')
Top_games(Name='Counter-Strike', Price='₹ 349', Category='Action, FPS, Multiplayer, Shooter')
Top_games(Name='Borderlands 2', Price='₹ 999', Category='Loot, Shooter, Action, Multiplayer')
Top_games(Name='Tomb Raider', Price=None, Category='Adventure, Action, Female Protagonist, Third Person')
Top_games(Name='PAYDAY 2', Price='₹ 349', Category='Co-op, Action, FPS, Heist')
Top_games(Name='Destiny 2', Price='Free To Play', Category='Free to Play, Looter Shooter, FPS, Multiplayer')
Top_games(Name='Counter-Strike: Source', Price='₹ 349', Category='Shooter, Act

In [63]:
df = pd.DataFrame({
    'Name':top_rated_games,
    'Price':top_rated_prices,
    'Category':top_rated_game_tags
})

In [64]:
df

Unnamed: 0,Name,Price,Category
0,Counter-Strike: Global Offensive,Free to Play,"FPS, Shooter, Multiplayer, Competitive"
1,Tom Clancy's Rainbow Six® Siege,₹ 999,"FPS, Hero Shooter, Multiplayer, Tactical"
2,Warframe,Free to Play,"Free to Play, Looter Shooter, Action, Co-op"
3,Left 4 Dead 2,₹ 349,"Zombies, Co-op, FPS, Multiplayer"
4,Counter-Strike,₹ 349,"Action, FPS, Multiplayer, Shooter"
5,Borderlands 2,₹ 999,"Loot, Shooter, Action, Multiplayer"
6,Tomb Raider,,"Adventure, Action, Female Protagonist, Third P..."
7,PAYDAY 2,₹ 349,"Co-op, Action, FPS, Heist"
8,Destiny 2,Free To Play,"Free to Play, Looter Shooter, FPS, Multiplayer"
9,Counter-Strike: Source,₹ 349,"Shooter, Action, FPS, Multiplayer"


In [67]:
df.sort_values('Price', ascending=False)

Unnamed: 0,Name,Price,Category
1,Tom Clancy's Rainbow Six® Siege,₹ 999,"FPS, Hero Shooter, Multiplayer, Tactical"
5,Borderlands 2,₹ 999,"Loot, Shooter, Action, Multiplayer"
12,Risk of Rain 2,₹ 719,"Third-Person Shooter, Action Roguelike, Action..."
13,Mount & Blade: Warband,₹ 699,"Medieval, RPG, Open World, Strategy"
3,Left 4 Dead 2,₹ 349,"Zombies, Co-op, FPS, Multiplayer"
4,Counter-Strike,₹ 349,"Action, FPS, Multiplayer, Shooter"
7,PAYDAY 2,₹ 349,"Co-op, Action, FPS, Heist"
9,Counter-Strike: Source,₹ 349,"Shooter, Action, FPS, Multiplayer"
10,Half-Life 2,₹ 349,"FPS, Action, Sci-fi, Classic"
14,MONSTER HUNTER: WORLD,"₹ 2,497","Co-op, Multiplayer, Action, Open World"
