![](https://api.brandy.run/core/core-logo-wide)

# MongoDB

En esa lección veremos un nuevo tipo de bases de datos, no relacional y `noSQL`, que por lo tanto no utiliza de tablas como PostgreSQL, sino que organiza los datos de una manera diferente, familiar a nosotros por la similitud con los tipos de datos en python. Las operaciones basicas `CRUD` y los conceptos básicos de bases de datos también se aplican.

## Databases, Collections and Documents

Así como en SQL, la mayor división en que estarán organizados los datos en mongoDB también se llama `Database`. Entretanto, la database no estará dividida en tablas, sino que en `Collections`.

Dentro de cada collection estaran los documentos, que son la entidad que queremos registrar. Cada documentro representará un registro, contendrá un `_id`, un identificador único generado automaticamente y que es un dato del tipo `ObjectID`. Además de eso, cada documento contendrá todos los atributos que sean necesários y queramos guardar.

![](img/mongo_org.png)

## Table vs Collection

Una de las principales diferencias entre utilizar una base de datos SQL y mongoDB es que en mongo, los diferentes documentos no están reglados por un formato definido para toda la colección. Eso es, en SQL cada entidad (fila en una tabla) debe tener el mismo formato, salvo la posibilidad de valores nulos, todas las filas tendrán los mismos atributos. 

En mongoDB no hay normas de esa maner, cada documento tendrá sus próprios atributos, podendo tener más o menos segun la necesidad. Igualmente podemos añadir o remover atributos de documentos ya existentes.

## Document vs Object vs Dictionary

El elemento de trabajo en mongoDB, sus registros, son los documentos. Como mongoDB utiliza una organización basada en el `JSON` (Javascript Object Notation), un documento es una serie de atributos en el formato `key: value`. Por lo tanto, un documento, para nosotros, es practicamente indistinguível de un dicionário en Python.

```json
{
    _id: ObjectID("13jdk9j3jsj3nr93572930u9c"),
    title: "Guardians of the Galaxy",
    year: 2014,
    director: ObjectID("13jdk9j3jssf3r9357293w3a"),
    synopsis: "Brash space adventurer Peter Quill (Chris Pratt) finds himself the quarry of relentless bounty hunters after he steals an orb coveted by Ronan, a powerful villain. To evade Ronan, Quill is forced into an uneasy truce with four disparate misfits: gun-toting Rocket Raccoon, treelike-humanoid Groot, enigmatic Gamora, and vengeance-driven Drax the Destroyer. But when he discovers the orb's true power and the cosmic threat it poses, Quill must rally his ragtag group to save the universe.",
    genre: [ObjectID("13fgd9j3jssf3r9357293w3a"),
            ObjectID("13fgd9j3jssf3r9357293w3b"),
            ObjectID("13fgd9j3jssf3r9357293w3c")],
    cast : [
            {"actor":ObjectID("13jdk9j3jssf3r94w423tkk"), "character":"Peter Quill"},
            {"actor":ObjectID("32jdk9j3jf8093u9n9f9wek"), "character":"Groot"},
            {"actor":ObjectID("19wef89f9sesf3kvaiji99e"), "character":"Rocket"},
           ]
}
```

Aunque no haya una relación explicita como en SQL, donde podemos combinar registros de tablas diferentes, en monogDB no podemos hacerlo, pero si podemos mencionar un documento en otro, por su "_id", tal cual en el documento arriba mencionamo al actor abajo, Chriss Pratt.


```json
{
    _id:ObjectID ("13jdk9j3jssf3r94w423tkk"),
    name: "Chris Pratt",
    birthday: " June 21, 1979",
    place_of_birth: "Virginia, Minnesota, USA"
}
```

## Querying mongoDB

Así como en SQL, en mongoDB una de las principales habilidad que debemos entrenar es la capacidad de recuperar información relevante. Para eso, escribiremos `queries`, pero no en SQL, sino que en `MQL` (Mongo Querying Language). No os preocupéis, entretanto, que las queries en Mongo son muy parecidas a sus documentos y por lo tanto es muy parecido a simplemente escribir dicionários. 

![](img/compass.png)

Tanto en MongoDB Compass como arriba cuanto a través de Python, la query siempre estará compuesta por esas mismas partes.

## Establishing connection

La libreria que utilizaremos para conectar python con MongoDB es [pymongo](https://pymongo.readthedocs.io/en/stable/).

In [None]:
mongodb+srv://<username>:<password>@core-bdml.zr1wc.mongodb.net/test

In [2]:
from pymongo import MongoClient
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
url = os.getenv("url")

In [4]:
url

'mongodb+srv://clase:clase@core-bdml.zr1wc.mongodb.net/test'

In [5]:
client = MongoClient(url)

In [6]:
client

MongoClient(host=['core-bdml-shard-00-00.zr1wc.mongodb.net:27017', 'core-bdml-shard-00-02.zr1wc.mongodb.net:27017', 'core-bdml-shard-00-01.zr1wc.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='atlas-p4t5f9-shard-0', ssl=True)

In [7]:
db = client.get_database("core-bdml")

In [9]:
companies = db.companies

In [10]:
companies

Collection(Database(MongoClient(host=['core-bdml-shard-00-00.zr1wc.mongodb.net:27017', 'core-bdml-shard-00-02.zr1wc.mongodb.net:27017', 'core-bdml-shard-00-01.zr1wc.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, authsource='admin', replicaset='atlas-p4t5f9-shard-0', ssl=True), 'core-bdml'), 'companies')

## Filter

El parámetro `filter`, es el más importante de la query. En ello definiremos las condiciones para cuales documentos recibiremos y cuales no. Ese parámetro cumple un papel similar a la cláusula `WHERE` en SQL.

In [16]:
cur = companies.find({})

In [17]:
res = list(cur)

In [13]:
next(cur)

{'_id': ObjectId('61602944d18af5e95f243f20'),
 'name': 'Wetpaint',
 'permalink': 'abc2',
 'crunchbase_url': 'http://www.crunchbase.com/company/wetpaint',
 'homepage_url': 'http://wetpaint-inc.com',
 'blog_url': 'http://digitalquarters.net/',
 'blog_feed_url': 'http://digitalquarters.net/feed/',
 'twitter_username': 'BachelrWetpaint',
 'category_code': 'web',
 'number_of_employees': 47,
 'founded_year': 2005,
 'founded_month': 10,
 'founded_day': 17,
 'deadpooled_year': 1,
 'tag_list': 'wiki, seattle, elowitz, media-industry, media-platform, social-distribution-system',
 'alias_list': '',
 'email_address': 'info@wetpaint.com',
 'phone_number': '206.859.6300',
 'description': 'Technology Platform Company',
 'created_at': {'$date': 1180075887000},
 'updated_at': 'Sun Dec 08 07:15:44 UTC 2013',
 'overview': '<p>Wetpaint is a technology platform company that uses its proprietary state-of-the-art technology and expertise in social media to build and monetize audiences for digital publishers.

In [19]:
res[:5]

[{'_id': ObjectId('61602944d18af5e95f243f20'),
  'name': 'Wetpaint',
  'permalink': 'abc2',
  'crunchbase_url': 'http://www.crunchbase.com/company/wetpaint',
  'homepage_url': 'http://wetpaint-inc.com',
  'blog_url': 'http://digitalquarters.net/',
  'blog_feed_url': 'http://digitalquarters.net/feed/',
  'twitter_username': 'BachelrWetpaint',
  'category_code': 'web',
  'number_of_employees': 47,
  'founded_year': 2005,
  'founded_month': 10,
  'founded_day': 17,
  'deadpooled_year': 1,
  'tag_list': 'wiki, seattle, elowitz, media-industry, media-platform, social-distribution-system',
  'alias_list': '',
  'email_address': 'info@wetpaint.com',
  'phone_number': '206.859.6300',
  'description': 'Technology Platform Company',
  'created_at': {'$date': 1180075887000},
  'updated_at': 'Sun Dec 08 07:15:44 UTC 2013',
  'overview': '<p>Wetpaint is a technology platform company that uses its proprietary state-of-the-art technology and expertise in social media to build and monetize audiences f

In [20]:
type(res[0])

dict

In [21]:
res[0].keys()

dict_keys(['_id', 'name', 'permalink', 'crunchbase_url', 'homepage_url', 'blog_url', 'blog_feed_url', 'twitter_username', 'category_code', 'number_of_employees', 'founded_year', 'founded_month', 'founded_day', 'deadpooled_year', 'tag_list', 'alias_list', 'email_address', 'phone_number', 'description', 'created_at', 'updated_at', 'overview', 'image', 'products', 'relationships', 'competitions', 'providerships', 'total_money_raised', 'funding_rounds', 'investments', 'acquisition', 'acquisitions', 'offices', 'milestones', 'video_embeds', 'screenshots', 'external_links', 'partners'])

Una query vacia `{}` devuelve todos los elementos de esa colllection.

In [98]:
list(companies.find({}))[:3]

KeyboardInterrupt: 

Para efectuar una busqueda en Mongo, simplemente tenemos que pasarle como parámetro de busqueda un documento equivalente a los que queremos como respuesta. 

Por ejemplo, si quisieramos buscar todos los documentos de empresas cuyo nombre es `Facebook`, le passamos un dicionario con `name:"Facebook"`.


- Literal search

In [23]:
cur = companies.find({"name":"Facebook"})

In [24]:
res = list(cur)

In [25]:
len(res)

1

In [26]:
res[0]

{'_id': ObjectId('61602944d18af5e95f243f24'),
 'name': 'Facebook',
 'permalink': 'facebook',
 'crunchbase_url': 'http://www.crunchbase.com/company/facebook',
 'homepage_url': 'http://facebook.com',
 'blog_url': 'http://blog.facebook.com',
 'blog_feed_url': 'http://blog.facebook.com/atom.php',
 'twitter_username': 'facebook',
 'category_code': 'social',
 'number_of_employees': 5299,
 'founded_year': 2004,
 'founded_month': 2,
 'founded_day': 1,
 'deadpooled_year': None,
 'deadpooled_month': None,
 'deadpooled_day': None,
 'deadpooled_url': '',
 'tag_list': 'facebook, college, students, profiles, network, online-communities, social-networking',
 'alias_list': '',
 'email_address': '',
 'phone_number': '',
 'description': 'Social network',
 'created_at': 'Fri May 25 21:22:15 UTC 2007',
 'updated_at': 'Thu Nov 21 19:40:55 UTC 2013',
 'overview': '<p>Facebook is the world&#8217;s largest social network, with over <a href="http://techcrunch.com/2013/07/24/facebook-growth-2/" title="1.15 bill

Si quisieramos buscar todos aquellos que tienen `category_code: 'social'`:

In [28]:
next(companies.find({"category_code":"social"}))

{'_id': ObjectId('61602944d18af5e95f243f24'),
 'name': 'Facebook',
 'permalink': 'facebook',
 'crunchbase_url': 'http://www.crunchbase.com/company/facebook',
 'homepage_url': 'http://facebook.com',
 'blog_url': 'http://blog.facebook.com',
 'blog_feed_url': 'http://blog.facebook.com/atom.php',
 'twitter_username': 'facebook',
 'category_code': 'social',
 'number_of_employees': 5299,
 'founded_year': 2004,
 'founded_month': 2,
 'founded_day': 1,
 'deadpooled_year': None,
 'deadpooled_month': None,
 'deadpooled_day': None,
 'deadpooled_url': '',
 'tag_list': 'facebook, college, students, profiles, network, online-communities, social-networking',
 'alias_list': '',
 'email_address': '',
 'phone_number': '',
 'description': 'Social network',
 'created_at': 'Fri May 25 21:22:15 UTC 2007',
 'updated_at': 'Thu Nov 21 19:40:55 UTC 2013',
 'overview': '<p>Facebook is the world&#8217;s largest social network, with over <a href="http://techcrunch.com/2013/07/24/facebook-growth-2/" title="1.15 bill

In [30]:
len(list(companies.find({"name":"facebook"})))

0

Si usamos un list comprehension, podemos ver solo los nombres de las empresas:

In [31]:
[com["name"] for com in companies.find({})]

['Wetpaint',
 'AdventNet',
 'Zoho',
 'Digg',
 'Facebook',
 'Omnidrive',
 'Postini',
 'Geni',
 'Flektor',
 'Fox Interactive Media',
 'Twitter',
 'StumbleUpon',
 'Gizmoz',
 'Scribd',
 'Slacker',
 'Lala',
 'Helio',
 'eBay',
 'MeetMoi',
 'Joost',
 'CBS',
 'Viacom',
 'Babelgum',
 'Plaxo',
 'Cisco',
 'Yahoo!',
 'Powerset',
 'Technorati',
 'SpinVox',
 'AddThis',
 'OpenX',
 'Mahalo',
 'Sparter',
 'Kyte',
 'Veoh',
 'Gannett',
 'Thoof',
 'Jingle Networks',
 'Info',
 'JotSpot',
 'Meetup',
 'Mercora',
 'NetRatings',
 'LifeLock',
 'Wesabe',
 'Jangl SMS',
 'SmugMug',
 'Prosper',
 'Google',
 'Jajah',
 'Skype',
 'YouTube',
 'Stickam',
 'blogTV',
 'Livestream',
 'Ustream',
 'AdaptiveBlue',
 'Pando Networks',
 'Intel',
 'GrandCentral',
 'Ikan',
 'delicious',
 'Topix',
 'Jobster',
 'Pownce',
 'Revision3',
 'AllPeers',
 'CriticalMetrics',
 'ZenZui',
 'Spock',
 'Wize',
 'SodaHead',
 'CastTV',
 'iSkoot',
 'EQO',
 'AllofMP3',
 'There',
 'SellABand',
 'Funny Or Die',
 'Steorn',
 'iContact',
 'MeeVee',
 'blink

## Project

Pero hay una manera más sencilla de hacer eso, usando el segundo campo de las queries, el `project`, que es el equivalente a la cláusula `SELECT` en SQL, donde definimos que parte de los documentos resultantes queremos ver.

In [75]:
filt = {"name":"Facebook"}
project = {"name":1, "_id":0}
next(companies.find(filt, project))

{'name': 'Facebook'}

Utilizando el `project`, el `_id` aparecerá siempre por defecto. Y además de decidir que campos queremos ver, podemos eligir cuales queremos omitir, marcandoles con `0` en lugar de `1`.

`NOTE: Con excepción de '_id', no podemos mezclar 0 y 1 en el project.`

In [80]:
filt = {"name":"Facebook"}
project = {"founded_year":1, "name":1, "permalink":1, "_id":0}
next(companies.find(filt, project))

{'name': 'Facebook', 'permalink': 'facebook', 'founded_year': 2004}

In [36]:
filt = {"name":"Facebook"}
project = {"name":1, "_id":0,  "permalink":1}
next(companies.find(filt, project))

{'name': 'Facebook', 'permalink': 'facebook'}

In [83]:
filt = {"name":"Facebook"}
project = {"name":0, "_id":1,  "permalink":0, "founded_year":0}
next(companies.find(filt, project))

{'_id': ObjectId('61602944d18af5e95f243f24'),
 'crunchbase_url': 'http://www.crunchbase.com/company/facebook',
 'homepage_url': 'http://facebook.com',
 'blog_url': 'http://blog.facebook.com',
 'blog_feed_url': 'http://blog.facebook.com/atom.php',
 'twitter_username': 'facebook',
 'category_code': 'social',
 'number_of_employees': 5299,
 'founded_month': 2,
 'founded_day': 1,
 'deadpooled_year': None,
 'deadpooled_month': None,
 'deadpooled_day': None,
 'deadpooled_url': '',
 'tag_list': 'facebook, college, students, profiles, network, online-communities, social-networking',
 'alias_list': '',
 'email_address': '',
 'phone_number': '',
 'description': 'Social network',
 'created_at': 'Fri May 25 21:22:15 UTC 2007',
 'updated_at': 'Thu Nov 21 19:40:55 UTC 2013',
 'overview': '<p>Facebook is the world&#8217;s largest social network, with over <a href="http://techcrunch.com/2013/07/24/facebook-growth-2/" title="1.15 billion monthly active users">1.15 billion monthly active users</a>.</p>\n

## .distinct
Si queremos verificar todos los valores distintos de un atributo, podemos utilizar ese método.

In [88]:
companies.find({}).distinct("name")

['(fluff)Friends',
 '*faircompanies',
 '/community',
 '1 800 vending',
 '1 to 101',
 '1-800-905-GEEK',
 '1000 Markets',
 '1000MIKES',
 '100for100 Web Hosting',
 '101 Holidays',
 '10East',
 '10to1',
 '111pix',
 '12 Inch Design',
 '123i',
 '123people',
 '12seconds',
 '12snap Mobile Advertising and Entertainment',
 '1366 Technologies',
 '140Labs',
 '140Ware',
 '140it',
 '148Apps',
 '15Talents',
 '15secondTV',
 '18D Information Technology China',
 '1915 Studios',
 '1938 Media',
 '1C Company',
 '1Cast',
 '1DayMakeover',
 '1FreeCart',
 '1Scan',
 '1Up',
 '1bib',
 '1calendar',
 '2 Levels Above',
 '2 Minutes',
 '2 under entertainment',
 '2020systems',
 '20:20 Mobile',
 '20DC',
 '22Plus Network',
 '23andMe',
 '247techsupport',
 '24Access',
 '24SevenOffice',
 '25 Pixels Media',
 '280 North',
 '2AdPro',
 '2Big2Send',
 '2Bmates',
 '2GeeksinaLab',
 '2U',
 '2Vouch',
 '2Web Technologies',
 '2Win-Solutions',
 '2Wire',
 '2channel',
 '2ergo',
 '2pad',
 '2threads',
 '2way interactive GmbH',
 '3 Phases Ren

## Query Operators

Pero no siempre vamos querer buscar literalmente por un valor u otro, necesitaremos hacer queries más flexibles. Para eso necesitaremos utilizar los operadores de MQL, y podemos ver todos elllos en el enlace abajo.

- [MongoDB Query Operators](https://docs.mongodb.com/manual/reference/operator/query/)

In [90]:
companies.find({"founded_year": {"$gt": 2005} }).distinct("founded_year")

[2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013]

In [51]:
companies.find({"name": {"$regex": "^[fF].*"} }).distinct("name")

['F-Control',
 'F-Origin',
 'F2 Interactive',
 'F2G',
 'F5 Networks',
 'FABINET',
 'FACE Africa',
 'FACEinHOLE',
 'FANUC Robotics America',
 'FAROO',
 'FASTALLEY',
 'FAT Media',
 'FFWD Wheels',
 'FHOKE',
 'FINRA',
 'FIQL',
 'FIRST ROI',
 'FISCAL Technologies',
 'FK3',
 'FLIR Systems',
 'FLYPEANUT Studio',
 'FMYI',
 'FNCA',
 'FNZ',
 'FOI Corporation',
 'FORMA Therapeutics',
 'FPGA Central',
 'FQcode True Traceability',
 'FRV',
 'FRoSP',
 'FSV Payment Systems',
 'FTI Consulting',
 'FTL SOLAR',
 'FTRANS',
 'FUNFOOTER',
 'FUNKY BUSINESS',
 'FUPEI',
 'FUSE',
 'FX-BAR',
 'FXLabs Studios Private Ltd',
 'FYIndOut',
 'Fabchannel',
 'Fabian Rossano Studio',
 'Fabric Interactive',
 'Fabrique',
 'Face Your Manga',
 'FaceKoo',
 'FaceTec',
 'FaceTime Strategy',
 'Facebook',
 'Facebook Causes Application',
 'FacebookLicious!',
 'Facebookster',
 'Fachak',
 'Facilitas',
 'Factery',
 'Factiva',
 'Factonomy',
 'Factor IT',
 'Factor Technology Group',
 'Factor Three Software',
 'Fail Dogs',
 'Fail Fund',


## `$and` and `$or`

In [91]:
filt = {
    "name": {"$regex": "^[fF].*"},
    "$or":[
        {"founded_year":2005},
        {"founded_year":2006},
        {"founded_year":2007}
    ]
}

In [92]:
list(companies.find(filt))[:5]

[{'_id': ObjectId('61602944d18af5e95f243f6e'),
  'name': 'Funny Or Die',
  'permalink': 'funny-or-die',
  'crunchbase_url': 'http://www.crunchbase.com/company/funny-or-die',
  'homepage_url': 'http://funnyordie.com',
  'blog_url': 'http://www.funnyordie.com/blog',
  'blog_feed_url': 'http://www.funnyordie.com/blog/page_1/rss',
  'twitter_username': 'funnyordie',
  'category_code': 'games_video',
  'number_of_employees': None,
  'founded_year': 2007,
  'founded_month': 3,
  'founded_day': 1,
  'deadpooled_year': None,
  'deadpooled_month': None,
  'deadpooled_day': None,
  'deadpooled_url': None,
  'tag_list': 'celebrity, video, comedy',
  'alias_list': '',
  'email_address': 'suggestions@funnyordie.com',
  'phone_number': '',
  'description': 'comedy video website',
  'created_at': 'Fri Jul 06 06:07:45 UTC 2007',
  'updated_at': 'Mon Aug 26 22:39:01 UTC 2013',
  'overview': '<p><a href="http://funnyordie.com/" title="Funny Or Die" rel="nofollow">Funny Or Die</a> is a comedy video websi

In [62]:
filt = {
    "$and" : [
        {"name": {"$regex": "^[fF].*"} },
        {"founded_year":2005}
    ]
}

In [63]:
next(companies.find(filt))

{'_id': ObjectId('61602944d18af5e95f244052'),
 'name': 'Flock',
 'permalink': 'flock',
 'crunchbase_url': 'http://www.crunchbase.com/company/flock',
 'homepage_url': 'http://flock.com',
 'blog_url': 'http://www.flock.com/blog',
 'blog_feed_url': '',
 'twitter_username': 'flock',
 'category_code': 'web',
 'number_of_employees': 40,
 'founded_year': 2005,
 'founded_month': 1,
 'founded_day': 1,
 'deadpooled_year': None,
 'deadpooled_month': None,
 'deadpooled_day': None,
 'deadpooled_url': None,
 'tag_list': 'webbrowser, mozilla, social, browser, platform, innovative, transformative, disruptive, dynamic',
 'alias_list': '',
 'email_address': 'flock@kfcomm.com',
 'phone_number': '415-255-6511',
 'description': 'free social web browser',
 'created_at': 'Sat Aug 11 12:01:28 UTC 2007',
 'updated_at': 'Wed May 29 23:15:52 UTC 2013',
 'overview': '<p>Flock is a free web browser built on the Mozilla Firefox architecture.  </p>\n\n<p>Flock aggregates social networks, social media, webmail and re

El operador `$and` entretanto puede estar omitido en la mayoria de los casos, simplemente consideramos las dos condiciones como parte del mismo dicionario que pasamos como parametro. Repitiendo el ejemplo anterior:

In [None]:
filt = {
    {"name": {"$regex": "^[fF].*"} },
    {"founded_year":2005}
}

In [64]:
next(companies.find(filt))

{'_id': ObjectId('61602944d18af5e95f244052'),
 'name': 'Flock',
 'permalink': 'flock',
 'crunchbase_url': 'http://www.crunchbase.com/company/flock',
 'homepage_url': 'http://flock.com',
 'blog_url': 'http://www.flock.com/blog',
 'blog_feed_url': '',
 'twitter_username': 'flock',
 'category_code': 'web',
 'number_of_employees': 40,
 'founded_year': 2005,
 'founded_month': 1,
 'founded_day': 1,
 'deadpooled_year': None,
 'deadpooled_month': None,
 'deadpooled_day': None,
 'deadpooled_url': None,
 'tag_list': 'webbrowser, mozilla, social, browser, platform, innovative, transformative, disruptive, dynamic',
 'alias_list': '',
 'email_address': 'flock@kfcomm.com',
 'phone_number': '415-255-6511',
 'description': 'free social web browser',
 'created_at': 'Sat Aug 11 12:01:28 UTC 2007',
 'updated_at': 'Wed May 29 23:15:52 UTC 2013',
 'overview': '<p>Flock is a free web browser built on the Mozilla Firefox architecture.  </p>\n\n<p>Flock aggregates social networks, social media, webmail and re

## Sort, Limit and Skip

Los otros parámetros de la query también son métodos como el `.distinct`. Por ejemplo, si queremos las empresas de social media ordenadas por el año de creación:

In [93]:
filt = {
    "name": {"$regex": "^[fF].*"}
}
list(companies.find(filt).sort("founded_year",1))[:5]

[{'_id': ObjectId('61602944d18af5e95f243f28'),
  'name': 'Flektor',
  'permalink': 'flektor',
  'crunchbase_url': 'http://www.crunchbase.com/company/flektor',
  'homepage_url': 'http://www.flektor.com',
  'blog_url': 'http://www.flektor-blog.com',
  'blog_feed_url': 'http://www.flektor-blog.com/video_editing_software/index.rdf',
  'twitter_username': None,
  'category_code': 'games_video',
  'number_of_employees': None,
  'founded_year': None,
  'founded_month': None,
  'founded_day': None,
  'deadpooled_year': None,
  'deadpooled_month': None,
  'deadpooled_day': None,
  'deadpooled_url': None,
  'tag_list': 'flektor, photo, video',
  'alias_list': None,
  'email_address': None,
  'phone_number': None,
  'description': None,
  'created_at': 'Thu May 31 21:11:51 UTC 2007',
  'updated_at': 'Sat Nov 05 08:42:23 UTC 2011',
  'overview': '<p>Flektor is a rich-media mash-up platform that enables consumers to create, remix and share photos and videos on the internet without the need for adva

In [96]:
list(companies.find(filt).sort("founded_year",-1).limit(5))[2]

{'_id': ObjectId('61602944d18af5e95f247385'),
 'name': 'Fliggo',
 'permalink': 'fliggo',
 'crunchbase_url': 'http://www.crunchbase.com/company/fliggo',
 'homepage_url': 'http://RevziTv.com',
 'blog_url': 'http://blog.fliggo.com/',
 'blog_feed_url': 'http://blog.fliggo.com/rss.xml',
 'twitter_username': 'fliggo',
 'category_code': 'games_video',
 'number_of_employees': 2,
 'founded_year': 2012,
 'founded_month': 6,
 'founded_day': 11,
 'deadpooled_year': 2009,
 'deadpooled_month': 9,
 'deadpooled_day': 17,
 'deadpooled_url': '',
 'tag_list': 'video-sharing-community-fliggo-site-creation-platform',
 'alias_list': '',
 'email_address': 'info@fliggo.com',
 'phone_number': '857-234-2344',
 'description': 'Instant Video Communities',
 'created_at': 'Wed Feb 25 12:52:14 UTC 2009',
 'updated_at': 'Fri Dec 06 23:38:06 UTC 2013',
 'overview': '<p>Fliggo is instant video communities.</p>\n\n<p>Fliggo offers anyone the ability to create their own video-uploading site, and share the videos either p

In [95]:
list(companies.find(filt).sort("founded_year",-1).limit(5).skip(2))

[{'_id': ObjectId('61602944d18af5e95f247385'),
  'name': 'Fliggo',
  'permalink': 'fliggo',
  'crunchbase_url': 'http://www.crunchbase.com/company/fliggo',
  'homepage_url': 'http://RevziTv.com',
  'blog_url': 'http://blog.fliggo.com/',
  'blog_feed_url': 'http://blog.fliggo.com/rss.xml',
  'twitter_username': 'fliggo',
  'category_code': 'games_video',
  'number_of_employees': 2,
  'founded_year': 2012,
  'founded_month': 6,
  'founded_day': 11,
  'deadpooled_year': 2009,
  'deadpooled_month': 9,
  'deadpooled_day': 17,
  'deadpooled_url': '',
  'tag_list': 'video-sharing-community-fliggo-site-creation-platform',
  'alias_list': '',
  'email_address': 'info@fliggo.com',
  'phone_number': '857-234-2344',
  'description': 'Instant Video Communities',
  'created_at': 'Wed Feb 25 12:52:14 UTC 2009',
  'updated_at': 'Fri Dec 06 23:38:06 UTC 2013',
  'overview': '<p>Fliggo is instant video communities.</p>\n\n<p>Fliggo offers anyone the ability to create their own video-uploading site, and 

`NOTE: Para aceder a keys anidadas, debemos separarlas por punto.`

In [97]:
next(companies.find({"competitions.0.competitor.name":"Wikia"}))

{'_id': ObjectId('61602944d18af5e95f243f20'),
 'name': 'Wetpaint',
 'permalink': 'abc2',
 'crunchbase_url': 'http://www.crunchbase.com/company/wetpaint',
 'homepage_url': 'http://wetpaint-inc.com',
 'blog_url': 'http://digitalquarters.net/',
 'blog_feed_url': 'http://digitalquarters.net/feed/',
 'twitter_username': 'BachelrWetpaint',
 'category_code': 'web',
 'number_of_employees': 47,
 'founded_year': 2005,
 'founded_month': 10,
 'founded_day': 17,
 'deadpooled_year': 1,
 'tag_list': 'wiki, seattle, elowitz, media-industry, media-platform, social-distribution-system',
 'alias_list': '',
 'email_address': 'info@wetpaint.com',
 'phone_number': '206.859.6300',
 'description': 'Technology Platform Company',
 'created_at': {'$date': 1180075887000},
 'updated_at': 'Sun Dec 08 07:15:44 UTC 2013',
 'overview': '<p>Wetpaint is a technology platform company that uses its proprietary state-of-the-art technology and expertise in social media to build and monetize audiences for digital publishers.