Formatting text in Colaboratory: A guide to Colaboratory markdown
===

## What is markdown?

Colaboratory has two types of cells: text and code. The text cells are formatted using a simple markup language called markdown, based on [the original](https://daringfireball.net/projects/markdown/syntax).

More markdown examples:

*italics* and _italics_\

**bold**

~~strikethrough~~

No indent

>One level of indentation

>>Two levels of indentation

An ordered list:

1. One
1. Two
1. Three
An unordered list:

* One
* Two
* Three

In [2]:
# this is a comment - will not execute if you run this cell
# comments are used to explain what a certain snippet of code is for
# to add a comment, just type a "#" before any text

## In this exercise, we will use pandas to load a csv file


In [3]:
# pandas is a library in Python commonly used for data analysis and data manipulation

import pandas as pd

## In this exercise, let's  upload the DataSeerGrabPrizeData from the case study

In [1]:
# in order to upload a file from your computer to this Colab notebook, we will be using a module from google.colab called "files"
# this upload might take a while because of the size of the dataset (265k+ rows)

from google.colab import files
files.upload()

Saving DataSeerGrabPrizeData.csv to DataSeerGrabPrizeData (1).csv


{'DataSeerGrabPrizeData.csv': b'source,created_at_local,pick_up_latitude,pick_up_longitude,drop_off_latitude,drop_off_longitude,city,fare,pick_up_distance,state\r\nADR,46:18.0,14.604348,120.998654,14.53737,120.994423,Metro Manila,281.875,0.389894,CANCELLED\r\nT47,51:59.0,14.590099,121.082645,14.508611,121.019444,Metro Manila,413.125,2.20977,COMPLETED\r\nT47,21:24.0,14.582707,121.061458,14.537752,121.001379,Metro Manila,277.5,2.70291,COMPLETED\r\nADR,53:34.0,14.585812,121.060171,14.575915,121.085487,Metro Manila,220.625,0.321403,CANCELLED\r\nIOS,49:16.0,14.55201,121.05126,14.63021,120.99592,Metro Manila,378.125,0.667067,COMPLETED\r\nT47,26:18.0,14.589394,121.059928,14.444546,120.993874,Metro Manila,505,0.289595,COMPLETED\r\nIOS,27:06.0,14.58645,121.04887,14.63925,121.03681,Metro Manila,229.375,1.54755,COMPLETED\r\nT47,47:03.0,14.588782,121.097317,14.583526,121.05698,Metro Manila,211.875,2.91636,COMPLETED\r\nIOS,37:49.0,14.60688,121.08063,14.61947,121.08618,Metro Manila,163.75,0.854165,C

In [4]:
# we will use a pandas to load the csv file we just uploaded to a dataframe

df = pd.read_csv('DataSeerGrabPrizeData.csv')

In [5]:
# we will use describe() to calculate statistical data from the dataset we just loaded
df.describe()

Unnamed: 0,pick_up_latitude,pick_up_longitude,drop_off_latitude,drop_off_longitude,fare,pick_up_distance
count,52859.0,52859.0,52859.0,52859.0,52859.0,31367.0
mean,14.572785,121.039401,14.569849,121.028011,163.876729,1.182424
std,0.159114,0.097341,0.243406,1.228611,1250.892282,0.902639
min,7.035082,119.38805,1.52195,-77.038043,0.0,0.00211
25%,14.55289,121.01851,14.550934,121.016308,0.0,0.487858
50%,14.57126,121.03444,14.56545,121.033653,181.25,0.952427
75%,14.59896,121.057031,14.603396,121.054371,251.25,1.679155
max,16.63781,125.63221,38.90749,125.64556,201098.75,6.75894


In this exercise, we'll be removing the rows with missing pick up distances. 

From inspection, all unallocated rides had no pick up distance value (i.e. if the ride was unallocated, passenger was not picked up)

In [6]:
# dropna() is a function that removes missing values

df = df.dropna()

In [7]:
# we'll run describe() again to check if rows with missing values are now dropped, i.e. if output shows reduced row count

df.describe()

Unnamed: 0,pick_up_latitude,pick_up_longitude,drop_off_latitude,drop_off_longitude,fare,pick_up_distance
count,31367.0,31367.0,31367.0,31367.0,31367.0,31367.0
mean,14.578223,121.036354,14.575031,121.020971,276.161571,1.182424
std,0.047347,0.033301,0.211828,1.582618,1614.272783,0.902639
min,10.309494,120.760893,8.481051,-77.038043,137.5,0.00211
25%,14.552925,121.01792,14.55098,121.016308,194.375,0.487858
50%,14.57151,121.033974,14.56524,121.03349,233.75,0.952427
75%,14.599995,121.05642,14.60369,121.0538,290.625,1.679155
max,14.914973,123.895252,38.90749,124.610864,201098.75,6.75894


In [15]:
# we'll use another function from pandas to write the dataframe to another csv file
# we included "index = False" in the parameter to exclude the row numbers

df.to_csv('Grab_new.csv', index=False)

In [20]:
df.to_csv('Grab_with_row_numbers.csv')

In [18]:
files.download('Grab_new.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [23]:
files.download('Grab_with_row_numbers.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>