# MONGO DB ANALYSIS

## Import box

In [1]:
from pymongo import MongoClient
import pandas as pd
import time

## Connection with MongoDB

In [4]:
client = MongoClient("localhost:27017")
db = client["Ironhack"]
coll = db.get_collection("companies")

## Find unique values of type category

In [18]:
unique_cat = pd.DataFrame(coll.find({}, {"category_code":1, "name":1, "_id":0}))

In [19]:
unique_cat["category_code"].unique()

array(['web', 'enterprise', 'software', 'news', 'social',
       'network_hosting', 'games_video', 'music', 'mobile', 'search',
       'advertising', 'messaging', 'security', 'photo_video', 'finance',
       'hardware', 'ecommerce', 'travel', 'public_relations', 'other',
       'real_estate', 'semiconductor', 'analytics', 'health', 'legal',
       'sports', 'biotech', 'cleantech', 'education', 'consulting',
       'transportation', None, 'hospitality', 'fashion', 'nonprofit',
       'nanotech', 'automotive', 'design', 'manufacturing', 'government',
       'local', 'medical'], dtype=object)

Important categories: games_video, design, mobile. Gamedev could be categorized as design, too. What if the designers of this database mistyped the cateogory? We could have gamedev industries in both.

In [38]:
filt = {"category_code":"design"}
proj = {"category_code":1, "name":1, "_id":0}

categ = pd.DataFrame(coll.find(filt,proj))
categ.shape

(4, 2)

Only 4 companies, not good. This is not the best way to target gamedev companies. Let's use the "description" and "overview" tags

## Find gaming companies by description, overview, total money raised, country

In [128]:
df = pd.DataFrame(coll.find({}, {"description":1, "name":1, "_id":0}))

In [129]:
df.head()

Unnamed: 0,name,description
0,Wetpaint,Technology Platform Company
1,AdventNet,Server Management Software
2,Zoho,Online Business Apps Suite
3,Digg,user driven social content website
4,Facebook,Social network


Let's find the ones related with the gaming industry with RegEx. Attributes: description, overview, total money raised, country

In [130]:
filt = {"$and": [{"description":{"$regex": ".*gam.*|.*Gam.*"}, "overview":{"$regex": ".*gam.*|.*Gam.*"}, 
                  "total_money_raised":{"$regex": "\$.*B|\$.*M"}, "tag_list":{"$regex": ".*gam.*|.*Gam.*"}}]}

proj = {"description":1, "name":1, "_id":0, "description":1, "overview":1, "tag_list":1, "total_money_raised":1, "offices.country_code":1, "number_of_employees":1}

df_regex = pd.DataFrame(coll.find(filt,proj))
df_regex

Unnamed: 0,name,number_of_employees,tag_list,description,overview,total_money_raised,offices
0,Thumbplay,70.0,"mobile, music, video, sharing, gaming, cloud, ...","Music, Videos, Games for Mobile Devices",<p>Thumbplay is a provider of mobile entertain...,$41.5M,[{'country_code': 'USA'}]
1,Xfire,,"games, pc, entertainment, onlinegaming, skillg...",Social Gaming Portal & Platform,<p>Xfire is the leader in social gaming servic...,$7M,[{'country_code': 'USA'}]
2,OMGPOP,50.0,"dating, gaming, auction",Free online multiplayer game,<p>OMGPOP (formerly known as iminlikewithyou) ...,$16.6M,"[{'country_code': 'USA'}, {'country_code': 'US..."
3,FlowPlay,30.0,"flowplay, casual-games, virtual-world, avatars...",Virtual world technology and games,"<p>FlowPlay, a developer of browser-based virt...",$3.97M,[{'country_code': 'USA'}]
4,PlaySpan,,"mmog, commerce, gamecommerce",P2P in-game commerce,<p>PlaySpanâ„¢ is the global leader in monetiz...,$46.3M,"[{'country_code': 'USA'}, {'country_code': 'US..."
5,GameLayers,5.0,"extension, firefox, game, passive, multiplayer",Multiplayer Networked Games,<p>GameLayers Inc was a small game design comp...,$2M,[{'country_code': 'USA'}]
6,WildTangent,,"games, online-games, video-games, online-adver...",cross-device games company,<p>WildTangent is a worldwide cross-device gam...,$84M,[{'country_code': 'USA'}]
7,PlayFirst,,"mobile-games, iphone-games, ipad-games, casual...",Mobile games,<p>PlayFirst is the global leader in mobile ga...,$37.7M,"[{'country_code': 'USA'}, {'country_code': 'IR..."
8,Challenge Games,,"social-games, facebook-games, role-playing-gam...",Online social games,"<p><a href=""http://www.challengegames.com"" tit...",$14.5M,[{'country_code': 'USA'}]
9,Bunchball,60.0,"games, socialgaming, socialengagement",Gamification,<p>Bunchball is the market leader and visionar...,$12.5M,[{'country_code': 'USA'}]


## Filtering by company value