# Prerequisites
Before we start, make sure that you have the PyMongo distribution installed. In the Python shell, the following should run without raising an exception:

In [1]:
import pymongo
import json
import re

This tutorial also assumes that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:

$ mongod

# Making a Connection with MongoClient

The first step when working with PyMongo is to create a MongoClient to the running mongod instance. Doing so is easy:

In [2]:
from pymongo import MongoClient
client = MongoClient()

The above code will connect on the default host and port. We can also specify the host and port explicitly, as follows:

Or use the MongoDB URI format:

In [3]:
db = client.tweet_database

# Load json format files into database

In [19]:
for i in range(900, 1100):
    path = 'D:/Depression_project/data/tweets_'+str(i)+'.txt'
    with open(path) as json_file: 
        for line in json_file:
            item = json.loads(line)  # convert a string representation of dict to a dict
            db.tweet_collection.insert(item)



After the above procedures, documents are already stored in the database, and no need to load each time

# Work on the document - start from here

In [20]:
# check
cursor = db.tweet_collection.find()
print (db.tweet_collection.count())

10992242


In [None]:
cursor = db.tweet_collection.find({'lang': 'en'})
print (cursor.count())
for doc in cursor[:2]:
    print (doc['text'])

In [5]:
cursor = db.tweet_collection.find( { '$and':[{"text": {'$regex': re.compile(r' depressed ', re.I)}}, {'lang': 'en'}] } ) 
print (cursor.count())

81


In [10]:
cursor = db.tweet_collection.find( { '$and':[{"text": {'$regex': re.compile(r' bipolar ', re.I)}}, {'lang': 'en'}] } ) 
print (cursor.count())

9


In [6]:
for doc in cursor:
    print (doc['text'])

Now I’m even more depressed because the Palaye Royale show I was supposed to go to got fucking cancelled
RT @Afterhoursfeel: @NBCNews Hey y'all look @NBCNews being the propaganda arm for the prison industrial complex. How depressed of a country…
i reached that point when looking at dan and phil makes me depressed because i will never find anyone as perfect fo… https://t.co/wVMFsXvVms
y’all are bout the most depressed ass people
RT @DaddyTytee_: I was depressed for so long and nobody noticed
so nice not feeling like a depressed piece of shit
RT @AmariPrettyAss: depressed ain’t even the word .. https://t.co/uSdzmmjVHt
RT @numbrealist: https://t.co/64oup0kMrM phd and masters students 6 times more likely to be depressed than others
Dear friend, I know you are sad. Take care of yourself. I can not get close to you so you do not get depressed because of me...
ps. Think I’m depressed again and this time I can’t blame it one the birth control. 🤷🏼‍♀️
@melvam14 Depressed ass😁
I think parents sho

# Writing JSON Format to a (Text) File

# Reading JSON Format from a File

# Loop over file and save it as json file

In [19]:
data = {}
data['tweets'] = []
with open('tweets_1.txt') as json_file: 
    for line in json_file:
        line_new = json.loads(line)  # convert a string representation of dict to a dict
        data['tweets'].append(line_new)
        
with open('tweets1.json', 'w') as outfile:  
    json.dump(data, outfile)

# Getting a Database

A single instance of MongoDB can support multiple independent databases. When working with PyMongo you access databases using attribute style access on MongoClient instances:

In [7]:
db = client.tweet_database
#db = client['test-database']  # if the above does not work, use this one

In [21]:
#tweets_2 = db.tweet_collection

page = open('tweets1.json', 'r')
parsed = json.loads(page.read())

In [22]:
for item in parsed["tweets"]:
    db.tweet_collection.insert(item)

  from ipykernel import kernelapp as app


In [23]:
# find the document in a collection
cursor = db.tweet_collection.find()
for doc in cursor[:2]:
    print (doc)

{'_id': ObjectId('5ab93ca3e1b31a0500dce79a'), 'delete': {'status': {'user_id': 1119417500, 'id_str': '688997177239011329', 'id': 688997177239011329, 'user_id_str': '1119417500'}, 'timestamp_ms': '1522076900582'}}
{'in_reply_to_screen_name': None, 'favorited': False, 'id': 978287543203647488, 'in_reply_to_status_id_str': None, 'truncated': False, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'favorite_count': 0, 'text': "i can't believe we got 4 (FOUR) teasers today", 'timestamp_ms': '1522076900660', 'retweet_count': 0, 'lang': 'en', 'in_reply_to_user_id': None, 'reply_count': 0, 'is_quote_status': False, 'place': None, '_id': ObjectId('5ab93ca3e1b31a0500dce79b'), 'in_reply_to_user_id_str': None, 'geo': None, 'user': {'geo_enabled': False, 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'id': 727189578243788800, 'name': '𝒃𝒓𝒊𝒄𝒆 #WhatisLove?', 'protected': False, 'profile_image_url_https': 'https://pbs.t

In [34]:
cursor = db.tweet_collection.find({'lang': 'en'})
print (cursor.count())
for doc in cursor[:2]:
    print (doc['text'])

2296
i can't believe we got 4 (FOUR) teasers today
Someone just rang my doorbell twice and knocked and it’s 8am. #whythough


# Delete documents or drop a collection

In [6]:
# delect all documents in a collection
db.tweet_collection.delete_many({})

# drop a collection
db.tweet_collection.drop()

# Documents
Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

# Inserting a Document
To insert a document into a collection we can use the insert_one() method:

When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection. insert_one() returns an instance of InsertOneResult. For more information on "_id", see the documentation on _id.

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:


# Bulk Inserts
In order to make querying a little more interesting, let’s insert a few more documents. In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to insert_many(). This will insert each document in the list, sending only a single command to the server:
