<a href="https://colab.research.google.com/github/DesiPilla/frus-event-exctraction/blob/master/MongoDB_Setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os
import json
import time
import boto3
import pprint
import pandas as pd
from pymongo import MongoClient

In [2]:
cd '/content/drive/My Drive/FRUS/Raw Data/Combined Data'

/content/drive/.shortcut-targets-by-id/119/FRUS/Raw Data/Combined Data


# Set up environment

**Locate AWS credentials and start EC2 instance**

In [9]:
aws_id, aws_secret_key = pd.read_csv('../../Code/DesiPilla_accessKeys.csv').values[0]

In [10]:
instance_id = 'i-0d78025aafddfd00f'
ec2 = boto3.resource('ec2', 
                     'us-east-1',
                     aws_access_key_id=aws_id, 
                     aws_secret_access_key=aws_secret_key)

instance = ec2.Instance(instance_id)
instance.state

{'Code': 80, 'Name': 'stopped'}

In [26]:
instance.start()
print('Initializing...')
i = 0
while instance.state['Name'] != 'running':
    time.sleep(5)
    i += 5
    print(f'\t{i} seconds elapsed... ({instance.state['Name']})')
print('The EC2 instance is now running.')

{'Code': 64, 'Name': 'stopping'}

**SSH into the MongoDB located on the EC2 instance**

In [3]:
client = MongoClient('mongodb://52.71.228.156:27017/')
client.list_database_names()

['FRUS', 'admin', 'config', 'local']

**Select or create a database named `FRUS`**

In [4]:
db = client.FRUS
db.list_collection_names()

['Taft', 'AllPresidents']

# Taft Collection

**Take the preprared data and convert it to a JSON file**

In [6]:
def prepare_collection(path):
    df = pd.read_csv(path).rename(columns={'website':'source', 'text':'context'})
    df['stanford'] = 0
    return df.to_dict('records')

In [8]:
taft_collection = prepare_collection('taft_df.csv')
taft_collection[:2]

[{'Unnamed: 0': 106432,
  'context': 'The Ambassador in France ( Herrick ) to the Secretary of State American Embassy , Paris , July 28, 1914, 4 p.m. [ Received 7:30 p.m. ] [Telegram] To be communicated to the President: Situation in Europe is regarded here as the gravest in history. It is apprehended that civilization is threatened by demoralization which would follow a general conflagration. Demonstrations made against war here last night by laboring classes; it is said to be the first instance of its kind in France. It is felt that if Germany once mobilizes no backward step will be taken. France has strong reliance on her army but it is not giving way to undue excitement. There is faith and reliance on our high ideals and purposes, so that I believe expression from our nation would have great weight in this crisis. My opinion is encouraged at reception given utterances of British Minister for Foreign Affairs. I believe that a strong plea for delay and moderation from the President o

**Add the JSON file to the MongoDB as a collection**

In [9]:
db.Taft.insert_many(taft_collection, ordered=False)
print("Data has been exported to MongoDB server.")

Data has been exported to MongoDB server.


In [10]:
db.list_collection_names()

['Taft']

# All Presidents

**Take the preprared data and convert it to a JSON file**

In [11]:
all_presidents_collection = prepare_collection('all_presidents_df.csv')
all_presidents_collection[:2]

[{'Unnamed: 0': 0,
  'context': 'Memorandum of Conversation, by the Officer in Charge of West, Central, and East Africa Affairs ( Feld ) [ Washington ,] February 20, 1952 . Participants: Ford Foundation—Mr. Carl B. Spaeth Mr. John Howard Mr. Howard Tolley AF —Mr. Bourgerie Mr. Feld Mr. Meier DRN —Mr. Brown NEA/P —Mr. Fisk Mrs. Sloan Messrs. Spaeth , Howard and Tolley of the Ford Foundation came to the Department on Wednesday, February 20, 1952, to discuss in general terms the Foundation’s interest in extending its overseas activities to Africa. Mr. Bourgerie began the discussion by pointing out that, due to political considerations and suspicion of American motives, it appeared unlikely that much could be done in Portuguese possessions, and perhaps to a somewhat lesser extent, in Belgian and French possessions, although in each case for slightly different reasons. Broadly the Portuguese have not favored our sending American government or private experts to Angola and Mozambique for fea

**Add the JSON file to the MongoDB as a collection**

In [12]:
db.AllPresidents.insert_many(all_presidents_collection, ordered=False)
print("Data has been exported to MongoDB server.")

Data has been exported to MongoDB server.


In [13]:
db.list_collection_names()

['Taft', 'AllPresidents']

# Run Query

In [5]:
collection = db.AllPresidents
query = {'source':'Taft'}
found = collection.find(query)

for doc in found[:1]:
    pprint.pprint(doc)

print("\n\n{:,} total documents found.".format(collection.count_documents(query)))

{'Unnamed: 0': 106432,
 '_id': ObjectId('5f5fb109ac046fc569f21257'),
 'context': 'The Ambassador in France ( Herrick ) to the Secretary of State '
            'American Embassy , Paris , July 28, 1914, 4 p.m. [ Received 7:30 '
            'p.m. ] [Telegram] To be communicated to the President: Situation '
            'in Europe is regarded here as the gravest in history. It is '
            'apprehended that civilization is threatened by demoralization '
            'which would follow a general conflagration. Demonstrations made '
            'against war here last night by laboring classes; it is said to be '
            'the first instance of its kind in France. It is felt that if '
            'Germany once mobilizes no backward step will be taken. France has '
            'strong reliance on her army but it is not giving way to undue '
            'excitement. There is faith and reliance on our high ideals and '
            'purposes, so that I believe expression from our nation w

# Shut down EC2 instance

In [20]:
instance.stop()
print('Stopping...')
i = 0
while instance.state['Name'] != 'stopped':
    time.sleep(5)
    i += 5
    print(f'\t{i} seconds elapsed... ({instance.state['Name']})')
print('The EC2 instance has been stopped.')

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '579',
   'content-type': 'text/xml;charset=UTF-8',
   'date': 'Tue, 15 Sep 2020 18:17:37 GMT',
   'server': 'AmazonEC2',
   'x-amzn-requestid': '92ef0581-54ba-4375-82d9-ede9756dcf61'},
  'HTTPStatusCode': 200,
  'RequestId': '92ef0581-54ba-4375-82d9-ede9756dcf61',
  'RetryAttempts': 0},
 'StoppingInstances': [{'CurrentState': {'Code': 64, 'Name': 'stopping'},
   'InstanceId': 'i-0d78025aafddfd00f',
   'PreviousState': {'Code': 16, 'Name': 'running'}}]}