# Metro data preparation

This script wrangles data regarding the metro entrances in DC. The data is taken as CSV files from:

LINK: opendata.dc.gov/datasets/efad2a2696164d2db65e6c16d426939e_95

After taking the desired data the script exports the dataframe as a new CSV file.

Some info about the raw data:

**Metro Station Entrances**. This dataset contains points representing Metro facilities, created as part of the DC Geographic Information System (DC GIS) for the D.C. Office of the Chief Technology Officer (OCTO) and participating D.C. government agencies. Station centroids were identified from visual observation of orthophotography and extracted from the planimetric data and a centroid for each entrance was heads-up digitized from the snapbase.


In [1]:
import pandas as pd
import numpy as np

Read the data:

In [2]:
metro_entrance_df= pd.read_csv('Metro_Station_Entrances.csv',sep=',')

Check if there are **null values** in this data set:

In [3]:
len(metro_entrance_df[metro_entrance_df.notnull()]), len(metro_entrance_df)

(88, 88)

The Metro_Station_Entrances.csv data file does not contain NaN values. 

We proceed by picking up the columns that we are interested in:

In [4]:
isa=[]
for i in metro_entrance_df:
    isa.append(i)
isa

['\xef\xbb\xbfX',
 'Y',
 'OBJECTID_1',
 'GIS_ID',
 'NAME',
 'WEB_URL',
 'EXIT_TO_ST',
 'FEATURECOD',
 'DESCRIPTIO',
 'CAPTUREYEA',
 'LINE',
 'ADDRESS_ID']

In [5]:
metro_entrance_df=metro_entrance_df[['NAME','LINE','Y','\xef\xbb\xbfX']]
metro_entrance_df.head()

Unnamed: 0,NAME,LINE,Y,﻿X
0,Dupont Circle,red,38.908683,-77.043319
1,Friendship Heights,red,38.960453,-77.085765
2,Brookland | CUA,red,38.933834,-76.995045
3,Deanwood,orange,38.908268,-76.934723
4,Deanwood,orange,38.908561,-76.935172


For clarity and consistency with the others data sets, X and Y are set as Longitude and Latitude variables.
* X = Longitude
* Y = Latitude

In [6]:
#   METRO ENTRANCES
metro_entrance_df['Text']=metro_entrance_df['NAME'] + '. Line: ' +  metro_entrance_df['LINE']
metro_entrance_df=metro_entrance_df.rename(columns={'\xef\xbb\xbfX': 'Longitude', 'Y': 'Latitude'})
metro_entrance_df.head()

Unnamed: 0,NAME,LINE,Latitude,Longitude,Text
0,Dupont Circle,red,38.908683,-77.043319,Dupont Circle. Line: red
1,Friendship Heights,red,38.960453,-77.085765,Friendship Heights. Line: red
2,Brookland | CUA,red,38.933834,-76.995045,Brookland | CUA. Line: red
3,Deanwood,orange,38.908268,-76.934723,Deanwood. Line: orange
4,Deanwood,orange,38.908561,-76.935172,Deanwood. Line: orange


Finally, the dataframe is exported as a CSV file

In [8]:
metro_entrance_df.to_csv('Metro_entrance_coordinates.csv', sep=',')