Research Question: 

> How appropriate and stereotypically gender divided are the free online games directed towards children?

> The team shared personal experience of coming across unusal themed games on free large gaming websites at a young age. They recalled seeing many games with thumbnails of pregnant princesses, foot cleaning, tongue cleaning, mental breakdowns, and dramatic romantic scenarios. These types of games were marketed specifically towards girls by the use of often including "girls" in the website title. The concept led them to further wonder what similar marketed boy themed games consisted of.

Methodology:

> The team scraped data from the Girls Go Games website at https://www.girlsgogames.com/allcategories in order to conduct the research necessary.

> For the team to observe all of the initial relevant website features, the file was retrieved and printed as text using the requests library with the following code.

In [None]:
#import requests

#r = requests.get("https://www.girlsgogames.com/allcategories")

#print(r.text)

> The following function was created to obtain a list of all the links within the website's main page link representing EACH CATEGORY OF GAMES, it is necessary to import both the requests and BeautifulSoup library, create a list of all the links within the main page, and limit the list to the links representing the categories alone.

In [9]:
def get_game_category_links(url):
    """ 
    A function to get all game categories from a page of girlsgogames.com.

        Args: A string url to the all catgories link on girlsgogames.com.

        Output: A list of urls leading to each game category.
    
    """
    import requests
    from bs4 import BeautifulSoup
    
    #CREATES A LIST OF ALL LINKS PRESENT
    reqs = requests.get(url)
    soup = BeautifulSoup(reqs.text, 'html.parser')
    
    urls = []
    for link in soup.find_all('a'):
        urls.append(link.get('href'))
        
    #REMOVES UNNECESSARY LINKS
    del urls[0:204]
    del urls[len(urls)-11:len(urls)]

    urls.remove("https://www.girlsgogames.com/games/action")
    urls.remove("https://www.girlsgogames.com/games/adventure")
    urls.remove("https://www.girlsgogames.com/games/animals")
    urls.remove("https://www.girlsgogames.com/games/art-and-creativity")
    urls.remove("https://www.girlsgogames.com/games/beauty-games")
    urls.remove("https://www.girlsgogames.com/games/cooking")
    urls.remove("https://www.girlsgogames.com/games/decoration-games")
    urls.remove("https://www.girlsgogames.com/games/dress_up")
    urls.remove("https://www.girlsgogames.com/games/love-games")
    urls.remove("https://www.girlsgogames.com/games/puzzle")
    urls.remove("https://www.girlsgogames.com/games/simulation")
    urls.remove("https://www.girlsgogames.com/games/skill")
    urls.remove("https://www.girlsgogames.com/games/specials")
    urls.remove("https://www.girlsgogames.com/games/sports")

    return urls

> The function must be ran with the Girls Go Games main page to develop a variable of all the game categories.

In [10]:
game_categories = get_game_category_links('https://www.girlsgogames.com/allcategories')
print(game_categories)

['https://www.girlsgogames.com/games/io-games', 'https://www.girlsgogames.com/games/2-player', 'https://www.girlsgogames.com/games/3d-games', 'https://www.girlsgogames.com/games/aim__shoot', 'https://www.girlsgogames.com/games/arcade', 'https://www.girlsgogames.com/games/bomb-it-games', 'https://www.girlsgogames.com/games/boy-games', 'https://www.girlsgogames.com/games/clicker', 'https://www.girlsgogames.com/games/clicking-games', 'https://www.girlsgogames.com/games/easy-games', 'https://www.girlsgogames.com/games/endless-running', 'https://www.girlsgogames.com/games/fidget-spinner', 'https://www.girlsgogames.com/games/flappy-bird', 'https://www.girlsgogames.com/games/flying-games', 'https://www.girlsgogames.com/games/fun', 'https://www.girlsgogames.com/games/funny-games', 'https://www.girlsgogames.com/games/king', 'https://www.girlsgogames.com/games/monster', 'https://www.girlsgogames.com/games/multiplayer', 'https://www.girlsgogames.com/games/nitrome-games', 'https://www.girlsgogames

> The following function was created to obtain a list of all the links within each game category page link representing EACH GAME, it is necessary to import both the requests and BeautifulSoup library, create a list of all the links within the used page, and limit the list to the links representing the games alone.

In [11]:
def get_game_links(game_categories):
    """
    A function to get game urls given a list of category links.

        Args: A list of urls to different game category webpages.

        Output: A string representing the .txt file of the game urls separated by commas.

    """
    import requests
    from bs4 import BeautifulSoup

    each_list_of_games = []
    #LOOPS THROUGH EACH CATEGORY
    for category in game_categories:
        #CREATES A LIST OF ALL LINKS PRESENT
        reqs_g = requests.get(category)
        soup_g = BeautifulSoup(reqs_g.text, 'html.parser')
        
        urls_g = []
        for link_g in soup_g.find_all('a'):
            urls_g.append(link_g.get('href'))

        #REMOVES UNNECESSARY LINKS
        del urls_g[0:206]
        category_name = category[35:]
        link_g = f"https://www.girlsgogames.com/games/{category_name}"
        if link_g in urls_g:
            index_of_second = urls_g.index(f"https://www.girlsgogames.com/games/{category_name}")
            to_use = len(urls_g) - index_of_second
            del urls_g[len(urls_g)-to_use:len(urls_g)]
            for game in urls_g:
                each_list_of_games.append(game)
        else:
            continue
        
    return each_list_of_games

help

In [12]:
get_game_links(game_categories)

['https://www.girlsgogames.com/game/happy-snakes',
 'https://www.girlsgogames.com/game/wormszone',
 'https://www.girlsgogames.com/game/snowball-racing',
 'https://www.girlsgogames.com/game/paperio-2',
 'https://www.girlsgogames.com/game/bridge-water-rush',
 'https://www.girlsgogames.com/game/monstersio',
 'https://www.girlsgogames.com/game/squid-game-2',
 'https://www.girlsgogames.com/game/fish-eat-getting-big',
 'https://www.girlsgogames.com/game/holeio',
 'https://www.girlsgogames.com/game/goosegame-io',
 'https://www.girlsgogames.com/game/squid-runner',
 'https://www.girlsgogames.com/game/flipsurfio',
 'https://www.girlsgogames.com/game/amogusio',
 'https://www.girlsgogames.com/game/mob-control',
 'https://www.girlsgogames.com/game/run-boys',
 'https://www.girlsgogames.com/game/slippery-water-slides-aquaparkio',
 'https://www.girlsgogames.com/game/ducklingsio',
 'https://www.girlsgogames.com/game/wormsarenaio',
 'https://www.girlsgogames.com/game/basketballio',
 'https://www.girlsgo

EVERYTHING UNDER THIS POINT IS JUST MY INITIAL CODE

In [None]:
#STEP 1: PRINT FULL CATEGORY PAGE TEXT

import requests

r = requests.get("https://www.girlsgogames.com/allcategories")

print(r.text)

In [2]:
#STEP 2: MAKE A LIST OF ALL THE GENRES (AS URLS)

import requests
from bs4 import BeautifulSoup
 
url = 'https://www.girlsgogames.com/allcategories'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')
 
urls = []
for link in soup.find_all('a'):
    urls.append(link.get('href'))
    
#removing unnecessary links
del urls[0:204]
del urls[len(urls)-11:len(urls)]

urls.remove("https://www.girlsgogames.com/games/action")
urls.remove("https://www.girlsgogames.com/games/adventure")
urls.remove("https://www.girlsgogames.com/games/animals")
urls.remove("https://www.girlsgogames.com/games/art-and-creativity")
urls.remove("https://www.girlsgogames.com/games/beauty-games")
urls.remove("https://www.girlsgogames.com/games/cooking")
urls.remove("https://www.girlsgogames.com/games/decoration-games")
urls.remove("https://www.girlsgogames.com/games/dress_up")
urls.remove("https://www.girlsgogames.com/games/love-games")
urls.remove("https://www.girlsgogames.com/games/puzzle")
urls.remove("https://www.girlsgogames.com/games/simulation")
urls.remove("https://www.girlsgogames.com/games/skill")
urls.remove("https://www.girlsgogames.com/games/specials")
urls.remove("https://www.girlsgogames.com/games/sports")

print(urls)

['https://www.girlsgogames.com/games/io-games', 'https://www.girlsgogames.com/games/2-player', 'https://www.girlsgogames.com/games/3d-games', 'https://www.girlsgogames.com/games/aim__shoot', 'https://www.girlsgogames.com/games/arcade', 'https://www.girlsgogames.com/games/bomb-it-games', 'https://www.girlsgogames.com/games/boy-games', 'https://www.girlsgogames.com/games/clicker', 'https://www.girlsgogames.com/games/clicking-games', 'https://www.girlsgogames.com/games/easy-games', 'https://www.girlsgogames.com/games/endless-running', 'https://www.girlsgogames.com/games/fidget-spinner', 'https://www.girlsgogames.com/games/flappy-bird', 'https://www.girlsgogames.com/games/flying-games', 'https://www.girlsgogames.com/games/fun', 'https://www.girlsgogames.com/games/funny-games', 'https://www.girlsgogames.com/games/king', 'https://www.girlsgogames.com/games/monster', 'https://www.girlsgogames.com/games/multiplayer', 'https://www.girlsgogames.com/games/nitrome-games', 'https://www.girlsgogames

In [3]:
#STEP 3: MAKE A LIST OF ALL THE GAMES (AS URLS)

import requests
from bs4 import BeautifulSoup

each_list_of_games = []
for genre in urls:
    reqs_g = requests.get(genre)
    soup_g = BeautifulSoup(reqs_g.text, 'html.parser')
    
    urls_g = []
    for link_g in soup_g.find_all('a'):
        urls_g.append(link_g.get('href'))

    #remove unnecessary links
    del urls_g[0:206]
    genre_name = genre[35:]
    linkk = f"https://www.girlsgogames.com/games/{genre_name}"
    if linkk in urls_g:
        index_of_second = urls_g.index(f"https://www.girlsgogames.com/games/{genre_name}")
        to_use = len(urls_g) - index_of_second
        del urls_g[len(urls_g)-to_use:len(urls_g)]
        each_list_of_games.append(urls_g)
    else:
        continue

print(each_list_of_games)

[['https://www.girlsgogames.com/game/happy-snakes', 'https://www.girlsgogames.com/game/wormszone', 'https://www.girlsgogames.com/game/snowball-racing', 'https://www.girlsgogames.com/game/paperio-2', 'https://www.girlsgogames.com/game/goosegame-io', 'https://www.girlsgogames.com/game/fish-eat-getting-big', 'https://www.girlsgogames.com/game/wormsarenaio', 'https://www.girlsgogames.com/game/bridge-water-rush', 'https://www.girlsgogames.com/game/flipsurfio', 'https://www.girlsgogames.com/game/holeio', 'https://www.girlsgogames.com/game/squid-runner', 'https://www.girlsgogames.com/game/amogusio', 'https://www.girlsgogames.com/game/mob-control', 'https://www.girlsgogames.com/game/squid-game-2', 'https://www.girlsgogames.com/game/monstersio', 'https://www.girlsgogames.com/game/run-boys', 'https://www.girlsgogames.com/game/ducklingsio', 'https://www.girlsgogames.com/game/slippery-water-slides-aquaparkio', 'https://www.girlsgogames.com/game/basketballio', 'https://www.girlsgogames.com/game/mus

In [5]:
#STEP 4: MAKE A LIST OF ALL THE GAMES (AS STRINGS)

clean_games = []
for list_c in each_list_of_games:
    for current_link_c in list_c:
        game_name = current_link_c[34:]
        clean_games.append(game_name)
print(clean_games)

['happy-snakes', 'wormszone', 'snowball-racing', 'paperio-2', 'goosegame-io', 'fish-eat-getting-big', 'wormsarenaio', 'bridge-water-rush', 'flipsurfio', 'holeio', 'squid-runner', 'amogusio', 'mob-control', 'squid-game-2', 'monstersio', 'run-boys', 'ducklingsio', 'slippery-water-slides-aquaparkio', 'basketballio', 'muscles-rush', 'worm-hunt-snake-gameio-zone', 'pixel-bubblemanio', 'draw-this-io', 'pet-party', 'color-galaxy', 'warehousepanicio', 'drawario', 'evoworldio', 'conquerio', 'dust-busterio', 'sumoio', 'freethrowio', 'archersio', 'ghostfightio', 'santa-snakes', 'hungry-shark-arena-horror-night', 'yes-or-no-challenge-run', 'fireboy_and_watergirl_the_forest_temple', 'car-parking-city-duel', 'pet-trainer-duel', 'yes-or-no-challenge', 'fireboy__watergirl_3_the_ice_temple', 'snakes-and-ladders', 'boxing-gang-stars', 'fireboy-watergirl-2-the-light-temple', 'gin-rummy', 'fireboy__watergirl_4_crystal_temple', 'group-trivia-quiz', 'city-car-stunt-4', 'tic-tac-toe-paper', 'fish-eat-getting

In [26]:
#STEP _: MAKE A LIST OF THE INDIVIDUAL WORDS IN THE GAME TITLES (AS STRINGS)

game_words = []
for game_ti in clean_games:
    if "-" in game_ti:
        mini_list = game_ti.split('-')
    else:
        if "_" in game_ti:
            mini_list = game_ti.split("_")
    mini_list = game_ti
    for m in mini_list:
        game_words.append(mini_list)
print(game_words)

['happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'happy-snakes', 'wormszone', 'wormszone', 'wormszone', 'wormszone', 'wormszone', 'wormszone', 'wormszone', 'wormszone', 'wormszone', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'snowball-racing', 'paperio-2', 'paperio-2', 'paperio-2', 'paperio-2', 'paperio-2', 'paperio-2', 'paperio-2', 'paperio-2', 'paperio-2', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'goosegame-io', 'fish-eat-getting-big', 'fish-eat-getting-big', 'fish-eat-getting-big', 'fish-eat-getting-big', 

In [9]:
#STEP _: MAKE TWO LISTS OF ALL THE GAME WORDS AND THEIR OCCURRENCES




['a_hh', 'j']

In [27]:
#STEP 5: MAKE A LIST OF ALL THE LABELS ON EVERY GAME (AS URLS) (DO NOT RUN THIS, USE STORED VARIABLE INSTEAD!!!!)

import requests
from bs4 import BeautifulSoup

each_list_of_labels = []
for listt in each_list_of_games:
    for game in listt:
        reqs_l = requests.get(game)
        soup_l = BeautifulSoup(reqs_l.text, 'html.parser')
        
        urls_l = []
        for link_l in soup_l.find_all('a'):
            urls_l.append(link_l.get('href'))

        #remove unnecessary links
        pre_link = "https://www.girlsgogames.com/"
        if pre_link in urls_l:
            urls_l.remove("https://www.girlsgogames.com/")
        else:
            continue
        link_l = "https://www.girlsgogames.com/"
        link_l_two = 'https://www.girlsgogames.com/disclaimer'
        if link_l in urls_l and link_l_two in urls_l:
            key_index = urls_l.index("https://www.girlsgogames.com/")
            del urls_l[0:key_index+2]
            key_index_two = urls_l.index('https://www.girlsgogames.com/disclaimer')
            diff = len(urls_l) - key_index_two
            del urls_l[len(urls_l)-diff:len(urls_l)]
            each_list_of_labels.append(urls_l)
        else:
            continue
        
print(each_list_of_labels)

[['https://www.girlsgogames.com/games/io-games', 'https://www.girlsgogames.com/games/animals', 'https://www.girlsgogames.com/games/best-of-2019-girls-', 'https://www.girlsgogames.com/games/boy-games', 'https://www.girlsgogames.com/games/fun', 'https://www.girlsgogames.com/games/multiplayer', 'https://www.girlsgogames.com/games/play-games-stay-safe', 'https://www.girlsgogames.com/games/popular', 'https://www.girlsgogames.com/games/skill', 'https://www.girlsgogames.com/games/try-now-gd-girls', 'https://www.girlsgogames.com/games/free-games'], ['https://www.girlsgogames.com/games/io-games', 'https://www.girlsgogames.com/games/action', 'https://www.girlsgogames.com/games/arcade', 'https://www.girlsgogames.com/games/html5', 'https://www.girlsgogames.com/games/mobile_games', 'https://www.girlsgogames.com/games/popular'], ['https://www.girlsgogames.com/games/io-games', 'https://www.girlsgogames.com/games/animals', 'https://www.girlsgogames.com/games/boy-games', 'https://www.girlsgogames.com/g

In [7]:
#STEP 6: MAKE A LIST OF ALL THE LABELS (AS STRINGS)

import labellist

saved_labels = labellist.labels

clean_labels = []
for list_f in saved_labels:
    for current_link in list_f:
        label_name = current_link[35:]
        clean_labels.append(label_name)
print(clean_labels)

['io-games', 'animals', 'best-of-2019-girls-', 'boy-games', 'fun', 'multiplayer', 'play-games-stay-safe', 'popular', 'skill', 'try-now-gd-girls', 'free-games', 'io-games', 'action', 'arcade', 'html5', 'mobile_games', 'popular', 'io-games', 'animals', 'boy-games', 'multiplayer', 'play-games-stay-safe', 'popular', 'free-games', 'io-games', '3d-games', 'action', 'html5', 'mobile_games', 'popular', 'racing', 'snow-games', 'winter-games', 'io-games', 'arcade', 'fun', 'html5', 'multiplayer', 'point__click', 'popular', 'skill', 'io-games', 'action', 'multiplayer', 'play-games-stay-safe', 'popular', 'top-100', 'free-games', 'io-games', '3d-games', 'action', 'collecting', 'html5', 'mobile_games', 'multiplayer', 'popular', 'racing', 'io-games', '2-player', 'action', 'fish-games', 'html5', 'multiplayer', 'popular', 'singleplayer', 'io-games', '3d-games', 'action', 'adventure', 'arcade', 'html5', 'mobile_games', 'multiplayer', 'popular', 'racing', 'running-games', 'skill', 'io-games', '3d-games', 

In [8]:
#STEP 7: MAKE TWO LISTS OF ALL THE LABELS AND THEIR OCCURRENCES #fix this

sorted_list = sorted(clean_labels)
sing_labels = []
sing_counts = []
i = 0
while i < len(sorted_list):
    current_label = sorted_list[i]
    repeats = sorted_list.count(current_label)
    i = i + repeats
    sing_labels.append(current_label)
    sing_counts.append(repeats)

print(sing_labels)
print(sing_counts)

['', '-like-pizza', '1001_arabian_nights_games', '10x10', '2-player', '3d-games', 'accessories_dress_up_games', 'action', 'adam-and-eve', 'adventure', 'agic-pencil', 'agic-piano-tiles', 'ahjong-connect', 'ahjong-cook', 'aim__shoot', 'air-games', 'and-skin-doctor', 'animal_dress_up_games', 'animals', 'anime-games', 'apps', 'appy-snakes', 'ar-games', 'ar-salon', 'arcade', 'ark-me', 'arkadium', 'arking-rush', 'art-and-creativity', 'art-games', 'aster-chess-multiplayer', 'autumn', 'avatars', 'ave-the-baby-home-rush', 'avoiding', 'award_games', 'baby-games', 'baby_hazel_games', 'babysitting', 'back-to-candyland', 'back_to_school_games', 'baking', 'basketball', 'beach-games', 'beauty-games', 'best-of-2016', 'best-of-2017', 'best-of-2018', 'best-of-2019-girls-', 'bike', 'birthday-games', 'board__card', 'bomb-it-games', 'boy-dress-up-games', 'boy-games', 'boyfriend-games', 'brain-games', 'brain-games?page=2', 'brain-games?page=6', 'bubble_shooter', 'building-games', 'bunny-games', 'business-ga