<a href="https://colab.research.google.com/github/siu1997/Big-Data-Analytics/blob/main/Practical%209/1_Bicing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Practical 9**

**Aim:** To implement NoSQL Aggregate Function using PyMongo Database.

**Theory:** NoSQL term stands for “non SQL” while others say it stands for “not only SQL”.
NoSQL Database is a non-relational Data Management System.
Does not require a fixed schema. 
It avoids joins, and is easy to scale.
The major purpose of using a NoSQL database is for distributed data stores with humongous data storage needs.
NoSQL is used for Big data and real-time web apps. For example, companies like Twitter, Facebook and Google.

To work with NoSQL we are goig to use **Mongo DB**.

**Introduction to MongoDB**

MongoDB is a free and open-source database program, developed by MongoDB Inc. Latest stable release is the 4.0.0 ( 21 June 2018 ), here download the MongoDB community server.

**Installing MongoDB in Google Colab**

Now, we are going to install MongoDB server in Google Colab


# Example 1 - Bicing stations

In [None]:
dataset = "https://www.bicing.cat/availability_map/getJsonObject"     # Get JSON file from bicing
!wget $dataset                                                                   # gets_dataset

# Uploading data to          Mongo Database


--2021-07-14 10:54:02--  https://www.bicing.cat/availability_map/getJsonObject
Resolving www.bicing.cat (www.bicing.cat)... 188.166.143.182
Connecting to www.bicing.cat (www.bicing.cat)|188.166.143.182|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.bicing.barcelona/ [following]
--2021-07-14 10:54:02--  https://www.bicing.barcelona/
Resolving www.bicing.barcelona (www.bicing.barcelona)... 206.189.99.248
Connecting to www.bicing.barcelona (www.bicing.barcelona)|206.189.99.248|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘getJsonObject.1’

getJsonObject.1         [ <=>                ]  64.93K   393KB/s    in 0.2s    

2021-07-14 10:54:03 (393 KB/s) - ‘getJsonObject.1’ saved [66487]



##1. Install Pymongo

In [None]:
!pip install pymongo==3.7.2 folium==0.8.3  >/dev/null      # Install PyMongo and folium for map visualization

##2. Import libraries

In [None]:
import pymongo                            # Library to access MongoDB
from pymongo import MongoClient           # Imports MongoClient 
import pandas as pd                       # Library to work with dataframes
import folium                             # Library to visualize a map

##3. Connect to database

In [None]:
# uri (uniform resource identifier) defines the connection parameters 
# uri = 'mongodb:// USER : PASSWORD @ SERVER_NAME : PORT / DATABASENAME')
# uri = 'localhost:27017'
uri = 'mongodb://u1kkdrchfjim80tclysv:FeesC2ACNmI7be61RTst@brny4kjelauboxl-mongodb.services.clever-cloud.com:27017/brny4kjelauboxl'

# start client to connect to MongoDB server 
client = MongoClient( uri )

In [None]:
# Show existing database names
print(client.list_database_names())

['brny4kjelauboxl']


In [None]:
db = client.brny4kjelauboxl               # Set the database to work on
db.list_collection_names()                # List the collections available

['bicing', 'mobileBCN']

In [None]:
collection = db.bicing                    # Collection alias

##4. Quick data overview

In [None]:
num_documents = collection.count_documents({'_id' : {'$exists' : 1}})     # Counts the documents in database
print ( 'Number of documents in database = ' + str(num_documents) )
list ( collection.find().limit(1) )                                       # Shows the first document

Number of documents in database = 926


[{'_id': ObjectId('60ed9df249bc7ea2346a04ca'),
  'altitude': 21,
  'bikes': 27,
  'id': 2,
  'latitude': 41.39553,
  'longitude': 2.17706,
  'nearbyStations': '360, 368, 387, 414',
  'slots': 0,
  'status': 'OPN',
  'streetName': 'Roger de Flor/ Gran VĂ\xada',
  'streetNumber': 126,
  'type': 'BIKE',
  'updateTime': '01/08/18 17:43:08'}]

In [None]:
# The values of 'bikes' is string type instead of number. 
# In order to filter by number greater than, we need to convert the value to integer.
# following method to convert it

bikes_list = list(collection.distinct('bikes'))             # list the unique values of 'bikes', we get a list of strings 
for num in bikes_list:                                      # iterate the list, item by item
  collection.update_many({'bikes' : num},{'$set': {'bikes' : int(num)}})    # update each document with a number in string with the same number as Integer


##5. Query to database:  Get active stations with at least 3 bicycles

In [None]:
# Loading database query in pandas Dataframe
filters = {'status':'OPN', 'bikes' : {'$gte' : 3 }}   # Usage of gte Query Operator  $gte = "greater than or equal"
fields = { '_id', 'latitude' , 'longitude', 'bikes', 'slots'}

query = list( collection.find( filters , fields ) )
df = pd.DataFrame ( query )                             # Load the database reply in a Pandas DataFrame

In [None]:
print ( 'Numer of active stations with at least 3 bicycles: ' + str(len (query)) )

Numer of active stations with at least 3 bicycles: 662


In [None]:
df.iloc[0] # prints the first DataFrame row 

_id          60ed9df249bc7ea2346a04ca
latitude                      41.3955
longitude                     2.17706
slots                               0
bikes                              27
Name: 0, dtype: object

##6. Mark Bicing stations in map

In [None]:
center_lat = 41.378
center_lon = 2.139

locationmap = folium.Map(location=[ center_lat , center_lon ], zoom_start=16, width=800, height=600 )
longitud  = len( df )

for i in range ( longitud ):
    lng = float(df.iloc[i]['longitude'])
    lat = float(df.iloc[i]['latitude'])
    description = 'Bikes: ' + str(df.iloc[i]['bikes']) + '<br> Empty slots: ' + str(df.iloc[i]['slots'])
    folium.Marker( [ lat , lng ],
                 popup= description,
                 icon=folium.Icon(color='red')).add_to(locationmap)

locationmap