# Table of Content

1. [Read in Data as Pandas DataFrame](#Read-in-Data-as-Pandas-DataFrame)
2. [Random Sampling](#Random-Sampling)
3. [Indexing, Selecting and Filtering DataFrame](#Indexing,-Selecting-and-Filtering-DataFrame) 

**Combining and Merging Data Sets**

4.  [Concatenating Along an Axis](#Concatenating-Along-an-Axis)
5. [Database-style DataFrame Merges](#Database-style-DataFrame-Merges)
6. [Joining/ Merging on Index](#Joining/-Merging-on-Index)

**Reshaping and Pivoting**
7. [Reshaping with Hierarchical Indexing](#Reshaping-with-Hierarchical-Indexing)
8. [Pivoting “long” to “wide” Format](#Pivoting-“long”-to-“wide”-Format)


**Data Transformation**

10. [Removing Duplicates](#Removing-Duplicates)

11. [Transforming Data Using a Function or Mapping](#Transforming-Data-Using-a-Function-or-Mapping)

12. [Replacing Values/ Handling Missing Values](#Replacing-Values/-Handling-Missing-Values)

13. [Renaming Axis Indexes](#Renaming-Axis-Indexes)

Most of these materials are adopted from the Python for Data Analysis by Wes McKinney 

In [213]:
# import required packages
import os
import pandas as pd
import numpy as np
import glob

%matplotlib inline

# Read in Data as Pandas DataFrame

In [214]:
#print(os.getcwd())
    
weather_full = pd.read_csv('../data/weather_description.csv')  #(45253, 37)
temp_full = pd.read_csv('../data/temperature.csv') # (45253, 37)
city_full = pd.read_csv('../data/city_attributes.csv') #(36, 4)

In [215]:
weather_full.head(5)

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
0,2012-10-01 12:00:00,,,,,,,,,,...,,,,,,,haze,,,
1,2012-10-01 13:00:00,mist,scattered clouds,light rain,sky is clear,mist,sky is clear,sky is clear,sky is clear,sky is clear,...,broken clouds,few clouds,overcast clouds,sky is clear,sky is clear,sky is clear,haze,sky is clear,sky is clear,sky is clear
2,2012-10-01 14:00:00,broken clouds,scattered clouds,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,...,broken clouds,few clouds,sky is clear,few clouds,sky is clear,sky is clear,broken clouds,overcast clouds,sky is clear,overcast clouds
3,2012-10-01 15:00:00,broken clouds,scattered clouds,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,...,broken clouds,few clouds,sky is clear,few clouds,overcast clouds,sky is clear,broken clouds,overcast clouds,overcast clouds,overcast clouds
4,2012-10-01 16:00:00,broken clouds,scattered clouds,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,sky is clear,...,broken clouds,few clouds,sky is clear,few clouds,overcast clouds,sky is clear,broken clouds,overcast clouds,overcast clouds,overcast clouds


In [216]:
temp_full.head(5)

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
0,2012-10-01 12:00:00,,,,,,,,,,...,,,,,,,309.1,,,
1,2012-10-01 13:00:00,284.63,282.08,289.48,281.8,291.87,291.53,293.41,296.6,285.12,...,285.63,288.22,285.83,287.17,307.59,305.47,310.58,304.4,304.4,303.5
2,2012-10-01 14:00:00,284.629041,282.083252,289.474993,281.797217,291.868186,291.533501,293.403141,296.608509,285.154558,...,285.663208,288.247676,285.83465,287.186092,307.59,304.31,310.495769,304.4,304.4,303.5
3,2012-10-01 15:00:00,284.626998,282.091866,289.460618,281.789833,291.862844,291.543355,293.392177,296.631487,285.233952,...,285.756824,288.32694,285.84779,287.231672,307.391513,304.281841,310.411538,304.4,304.4,303.5
4,2012-10-01 16:00:00,284.624955,282.100481,289.446243,281.782449,291.857503,291.553209,293.381213,296.654466,285.313345,...,285.85044,288.406203,285.860929,287.277251,307.1452,304.238015,310.327308,304.4,304.4,303.5


In [217]:
city_full.head(5)

Unnamed: 0,City,Country,Latitude,Longitude
0,Vancouver,Canada,49.24966,-123.119339
1,Portland,United States,45.523449,-122.676208
2,San Francisco,United States,37.774929,-122.419418
3,Seattle,United States,47.606209,-122.332069
4,Los Angeles,United States,34.052231,-118.243683


# Random Sampling

We might want to sample a subset of the entire dataset to do data wrangling & exploration. This is especially useful when working with large data sets.  

To select a random subset without replacement, we can 
1. slice off the first k rows of the dataframe, using `pandas.DataFrame.iloc` indexing 
2. randomly sampling k rows from the dataframe, using `pandas.DataFrame.sample`

In [218]:
k = 1000

temp_small = temp_full.iloc[0:k]
temp_small = temp_full.sample(n=k, replace=False, random_state=0)

temp_small.head(2)

Unnamed: 0,datetime,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
37363,2017-01-05 07:00:00,268.79,270.63,280.83,270.79,286.23,285.66,282.64,282.97,281.15,...,272.44,272.92,267.45,272.06,278.194,285.15,281.48,282.96,283.07,285.15
23197,2015-05-26 01:00:00,290.487,289.975333,287.792,290.142,293.325333,292.842,298.808667,300.658667,290.558667,...,298.192,296.958667,290.434,293.025333,287.542,288.342,294.775333,293.708667,293.708667,291.458667


# Indexing, Selecting and Filtering DataFrame

- `DataFrame.filter`: Subset rows or columns of dataframe according to labels in the specified index.

- `DataFrame.loc`: Access a group of rows and columns by label(s) or a boolean array.
- `DataFrame.iloc`: Purely integer-location based indexing for selection by position.


Pandas Doc: 
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html
- https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html


In [299]:
df = city_full.set_index('City')
df.head(2)

Unnamed: 0_level_0,Country,Latitude,Longitude
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Vancouver,Canada,49.24966,-123.119339
Portland,United States,45.523449,-122.676208


#### `DataFrame.filter(items=[list-like], like=[string], regex=[string (regular expression)], axis=[int/ string axis name])`

In [None]:
# select columns by name

df.filter(items=['Latitude', 'Longitude'])
df.filter(regex='tude$', axis=1)

In [316]:
# select rows 

df.filter(items=['Portland', 'Seattle'], axis=0)

df.filter(like='San', axis=0)
df.filter(regex='^San', axis=0)

df.filter(regex='land$', axis=0)

Unnamed: 0_level_0,Country,Latitude,Longitude
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Portland,United States,45.523449,-122.676208
Seattle,United States,47.606209,-122.332069


#### `DataFrame.iloc[<selection>, <selection>]`

In [324]:
city_full.head(2)

Unnamed: 0,City,Country,Latitude,Longitude
0,Vancouver,Canada,49.24966,-123.119339
1,Portland,United States,45.523449,-122.676208


In [None]:
# Single selections using iloc and DataFrame

# Rows:
city_full.iloc[0] # first row of data frame - Note a Series data type output.
city_full.iloc[1] # second row of data frame 
city_full.iloc[-1] # last row of data frame

# Columns:
city_full.iloc[:,0] # first column of data frame 
city_full.iloc[:,1] # second column of data frame 
city_full.iloc[:,-1] # last column of data frame 

In [None]:
# Multiple columns and rows can be selected together using the .iloc indexer.

# Multiple row and column selections using iloc and DataFrame
city_full.iloc[0:5] # first five rows of dataframe
city_full.iloc[:, 0:2] # first two columns of data frame with all rows
city_full.iloc[[0,3,6,24], [2,3]] # 1st, 4th, 7th, 25th row + 3nd, 4rd columns.
city_full.iloc[0:5, 1:] # first 5 rows and every columns from the 2nd of data frame.


#### `DataFrame.loc[<selection>, <selection>]`

In [286]:
# Conditional that returns a boolean Series
city_full.loc[city_full['Country'] == 'Canada']

Unnamed: 0,City,Country,Latitude,Longitude
0,Vancouver,Canada,49.24966,-123.119339
25,Toronto,Canada,43.700111,-79.416298
28,Montreal,Canada,45.508839,-73.587807


In [288]:
# ... with column labels specified
city_full.loc[city_full['Country'] == 'Canada', ['Latitude', 'Longitude']]

Unnamed: 0,Latitude,Longitude
0,49.24966,-123.119339
25,43.700111,-79.416298
28,45.508839,-73.587807


In [257]:
# ... that match multiple row values
city_full.loc[city_full['City'].isin(['New York', 'Boston'])]

Unnamed: 0,City,Country,Latitude,Longitude
27,New York,United States,40.714272,-74.005966
29,Boston,United States,42.358429,-71.059769


In [261]:
# ... that match row values on different columns 
city_full.loc[city_full['City'].str.endswith("land") & city_full['Country'].str.startswith("United")] 

Unnamed: 0,City,Country,Latitude,Longitude
1,Portland,United States,45.523449,-122.676208


In [268]:
# ... 
city_full.loc[(city_full['Latitude'] > 40) & (city_full['Longitude'] <= -40)] 

Unnamed: 0,City,Country,Latitude,Longitude
0,Vancouver,Canada,49.24966,-123.119339
1,Portland,United States,45.523449,-122.676208
3,Seattle,United States,47.606209,-122.332069
14,Minneapolis,United States,44.979969,-93.26384
16,Chicago,United States,41.850029,-87.650047
20,Detroit,United States,42.331429,-83.045753
24,Pittsburgh,United States,40.44062,-79.995888
25,Toronto,Canada,43.700111,-79.416298
27,New York,United States,40.714272,-74.005966
28,Montreal,Canada,45.508839,-73.587807


In [269]:
# A lambda function that yields True/False values can also be used.
city_full.loc[city_full['City'].apply(lambda x: len(x.split(' ')) == 2)] 

Unnamed: 0,City,Country,Latitude,Longitude
2,San Francisco,United States,37.774929,-122.419418
4,Los Angeles,United States,34.052231,-118.243683
5,San Diego,United States,32.715328,-117.157257
6,Las Vegas,United States,36.174969,-115.137222
10,San Antonio,United States,29.42412,-98.493629
13,Kansas City,United States,39.099731,-94.578568
15,Saint Louis,United States,38.62727,-90.197891
27,New York,United States,40.714272,-74.005966


In [274]:
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = city_full['City'].apply(lambda x: len(x.split(' ')) == 2)

# Select only the True values in 'idx' and the columns specified:
city_full.loc[idx, ['City']] #.reset_index(drop=True)

Unnamed: 0,City
2,San Francisco
4,Los Angeles
5,San Diego
6,Las Vegas
10,San Antonio
13,Kansas City
15,Saint Louis
27,New York


In [325]:
# Does it return Pandas Series or DataFrame

print(type(city_full.loc[city_full['Country'] == 'Canada', 'City']))  # type Series when only one columns is selected
print(type(city_full.loc[city_full['Country'] == 'Canada', ['City']])) # DataFrame when list selection is used

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>


----

# Combining and Merging Data Sets 

Data contained in `pandas` objects can be combined together in a number of built-in ways:
- `pandas.merge` connects rows in DataFrames based on one or more keys. 
- `pandas.concat` glues or stacks together objects along an axis.
- `combine_first` instance method enables splicing together overlapping data to fill
in missing values in one object with values from another

Pandas doc: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

### Concatenating Along an Axis 

- `pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)`

In [208]:
df1= temp_full.set_index('datetime').rename_axis(None)
df2 = weather_full.set_index('datetime').rename_axis(None)

result = pd.concat([df1, df2])
result.head()

Unnamed: 0,Vancouver,Portland,San Francisco,Seattle,Los Angeles,San Diego,Las Vegas,Phoenix,Albuquerque,Denver,...,Philadelphia,New York,Montreal,Boston,Beersheba,Tel Aviv District,Eilat,Haifa,Nahariyya,Jerusalem
2012-10-01 12:00:00,,,,,,,,,,,...,,,,,,,309.1,,,
2012-10-01 13:00:00,284.63,282.08,289.48,281.8,291.87,291.53,293.41,296.6,285.12,284.61,...,285.63,288.22,285.83,287.17,307.59,305.47,310.58,304.4,304.4,303.5
2012-10-01 14:00:00,284.629,282.083,289.475,281.797,291.868,291.534,293.403,296.609,285.155,284.607,...,285.663,288.248,285.835,287.186,307.59,304.31,310.496,304.4,304.4,303.5
2012-10-01 15:00:00,284.627,282.092,289.461,281.79,291.863,291.543,293.392,296.631,285.234,284.6,...,285.757,288.327,285.848,287.232,307.392,304.282,310.412,304.4,304.4,303.5
2012-10-01 16:00:00,284.625,282.1,289.446,281.782,291.858,291.553,293.381,296.654,285.313,284.593,...,285.85,288.406,285.861,287.277,307.145,304.238,310.327,304.4,304.4,303.5


### Database-style DataFrame Merges

- Merge or join operations combine data sets by linking rows using one or more keys 
- `pd.merge(left, right, how='merge method', on='key', left_on='left_key', right_on='right_key')`, 


Merge method |	SQL Join Name	| Description 
- | - | -
left	| LEFT OUTER JOIN	| Use keys from left frame only
right	| RIGHT OUTER JOIN	| Use keys from right frame only
outer	| FULL OUTER JOIN	| Use union of keys from both frames
inner	| INNER JOIN	    | Use intersection of keys from both frames
         
Pandas doc: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html         

In [189]:
temp_small = temp_full.iloc[0:k]
temp_small_t = temp_small.set_index('datetime').rename_axis(None).transpose().rename_axis('City').reset_index()


In [190]:
city_temp = pd.merge(city_full, temp_small_t, on='City')

city_temp.head()

Unnamed: 0,City,Country,Latitude,Longitude,2012-10-01 12:00:00,2012-10-01 13:00:00,2012-10-01 14:00:00,2012-10-01 15:00:00,2012-10-01 16:00:00,2012-10-01 17:00:00,...,2012-11-11 18:00:00,2012-11-11 19:00:00,2012-11-11 20:00:00,2012-11-11 21:00:00,2012-11-11 22:00:00,2012-11-11 23:00:00,2012-11-12 00:00:00,2012-11-12 01:00:00,2012-11-12 02:00:00,2012-11-12 03:00:00
0,Vancouver,Canada,49.24966,-123.119339,,284.63,284.629041,284.626998,284.624955,284.622911,...,276.19,278.04,278.23,278.8,278.49,278.7,277.81,276.02,274.94,274.02
1,Portland,United States,45.523449,-122.676208,,282.08,282.083252,282.091866,282.100481,282.109095,...,276.88,277.42,277.94,278.6,279.42,279.46,279.14,277.92,277.28,275.62
2,San Francisco,United States,37.774929,-122.419418,,289.48,289.474993,289.460618,289.446243,289.431869,...,285.86,286.29,286.86,286.92,287.39,287.19,286.49,284.82,283.65,282.17
3,Seattle,United States,47.606209,-122.332069,,281.8,281.797217,281.789833,281.782449,281.775065,...,275.38,278.02,278.85,279.51,279.96,279.72,278.54,277.19,276.36,274.8
4,Los Angeles,United States,34.052231,-118.243683,,291.87,291.868186,291.862844,291.857503,291.852162,...,287.53,288.52,289.0,289.22,289.63,289.33,288.49,287.93,287.23,286.13


#### Checking for duplicate keys

Users can use the validate argument to automatically check whether there are unexpected duplicates in their merge keys

In [132]:
left = pd.DataFrame({'A' : [1,2], 'B' : [1, 2]})
right = pd.DataFrame({'A' : [4,5,6], 'B': [2, 2, 2]})
    
left.head(5)

Unnamed: 0,A,B
0,1,1
1,2,2


In [124]:
right.head(5)

Unnamed: 0,A,B
0,4,2
1,5,2
2,6,2


In [None]:
result = pd.merge(left, right, on='B', validate="one_to_one")

### Joining/ Merging on Index 

In [196]:
# city_full.set_index('City', inplace=True)
# temp_small_t.set_index('City', inplace=True)

city_full.head()

Unnamed: 0_level_0,Country,Latitude,Longitude
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Vancouver,Canada,49.24966,-123.119339
Portland,United States,45.523449,-122.676208
San Francisco,United States,37.774929,-122.419418
Seattle,United States,47.606209,-122.332069
Los Angeles,United States,34.052231,-118.243683


In [192]:
temp_small_t.head()

Unnamed: 0_level_0,2012-10-01 12:00:00,2012-10-01 13:00:00,2012-10-01 14:00:00,2012-10-01 15:00:00,2012-10-01 16:00:00,2012-10-01 17:00:00,2012-10-01 18:00:00,2012-10-01 19:00:00,2012-10-01 20:00:00,2012-10-01 21:00:00,...,2012-11-11 18:00:00,2012-11-11 19:00:00,2012-11-11 20:00:00,2012-11-11 21:00:00,2012-11-11 22:00:00,2012-11-11 23:00:00,2012-11-12 00:00:00,2012-11-12 01:00:00,2012-11-12 02:00:00,2012-11-12 03:00:00
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Vancouver,,284.63,284.629041,284.626998,284.624955,284.622911,284.620868,284.618824,284.616781,284.614738,...,276.19,278.04,278.23,278.8,278.49,278.7,277.81,276.02,274.94,274.02
Portland,,282.08,282.083252,282.091866,282.100481,282.109095,282.11771,282.126324,282.134939,282.143553,...,276.88,277.42,277.94,278.6,279.42,279.46,279.14,277.92,277.28,275.62
San Francisco,,289.48,289.474993,289.460618,289.446243,289.431869,289.417494,289.403119,289.388745,289.37437,...,285.86,286.29,286.86,286.92,287.39,287.19,286.49,284.82,283.65,282.17
Seattle,,281.8,281.797217,281.789833,281.782449,281.775065,281.767681,281.760297,281.752912,281.745528,...,275.38,278.02,278.85,279.51,279.96,279.72,278.54,277.19,276.36,274.8
Los Angeles,,291.87,291.868186,291.862844,291.857503,291.852162,291.846821,291.84148,291.836139,291.830798,...,287.53,288.52,289.0,289.22,289.63,289.33,288.49,287.93,287.23,286.13


In [199]:
city_temp = city_full.join(temp_small_t)
city_temp.head()

# same as
# result = pd.merge(city_full, temp_small_t, left_index=True, right_index=True, how='left')
# result.head(5)

Unnamed: 0_level_0,Country,Latitude,Longitude,2012-10-01 12:00:00,2012-10-01 13:00:00,2012-10-01 14:00:00,2012-10-01 15:00:00,2012-10-01 16:00:00,2012-10-01 17:00:00,2012-10-01 18:00:00,...,2012-11-11 18:00:00,2012-11-11 19:00:00,2012-11-11 20:00:00,2012-11-11 21:00:00,2012-11-11 22:00:00,2012-11-11 23:00:00,2012-11-12 00:00:00,2012-11-12 01:00:00,2012-11-12 02:00:00,2012-11-12 03:00:00
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Vancouver,Canada,49.24966,-123.119339,,284.63,284.629041,284.626998,284.624955,284.622911,284.620868,...,276.19,278.04,278.23,278.8,278.49,278.7,277.81,276.02,274.94,274.02
Portland,United States,45.523449,-122.676208,,282.08,282.083252,282.091866,282.100481,282.109095,282.11771,...,276.88,277.42,277.94,278.6,279.42,279.46,279.14,277.92,277.28,275.62
San Francisco,United States,37.774929,-122.419418,,289.48,289.474993,289.460618,289.446243,289.431869,289.417494,...,285.86,286.29,286.86,286.92,287.39,287.19,286.49,284.82,283.65,282.17
Seattle,United States,47.606209,-122.332069,,281.8,281.797217,281.789833,281.782449,281.775065,281.767681,...,275.38,278.02,278.85,279.51,279.96,279.72,278.54,277.19,276.36,274.8
Los Angeles,United States,34.052231,-118.243683,,291.87,291.868186,291.862844,291.857503,291.852162,291.846821,...,287.53,288.52,289.0,289.22,289.63,289.33,288.49,287.93,287.23,286.13


# Reshaping and Pivoting 

In [333]:
city_full.head()

Unnamed: 0,City,Country,Latitude,Longitude
0,Vancouver,Canada,49.24966,-123.119339
1,Portland,United States,45.523449,-122.676208
2,San Francisco,United States,37.774929,-122.419418
3,Seattle,United States,47.606209,-122.332069
4,Los Angeles,United States,34.052231,-118.243683


### Reshaping by Melt

- `pandas.melt()`: Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html#pandas.DataFrame.melt

In [342]:
city_long = pd.melt(city_full, id_vars = ['Country', 'City']) # var_name='Lat/Long', value_name='value'

# or, 
# city_full.melt(id_vars=['Country', 'City'], var_name='quantity')

### Reshaping by pivoting DataFrame objects 

- `pandas.pivot` : Pivot a DataFrame from long to wide format by given index / column values.
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html

In [368]:
city_wide = city_long.pivot(index='City', columns='variable', values='value')
city_wide.head()

variable,Latitude,Longitude
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Albuquerque,35.084492,-106.651138
Atlanta,33.749001,-84.387978
Beersheba,31.25181,34.791302
Boston,42.358429,-71.059769
Charlotte,35.227089,-80.843132


# Data Transformation 

### Renaming Axis Indexes 

### Removing Duplicates

- `DataFrame.drop_duplicates([subset, keep, …])`: Return DataFrame with duplicate rows removed, optionally only considering certain columns.

### Replacing Values/ Handling Missing Values

- `DataFrame.dropna([axis, how, thresh, …])`: Remove missing values.
- `DataFrame.fillna([value, method, axis, …])`:	Fill NA/NaN values using the specified method.
- `DataFrame.replace([to_replace, value, …])`:	Replace values given in to_replace with value.
- `DataFrame.interpolate([method, axis, limit, …])`: Interpolate values according to different methods.

### Discretization and Binning

- Bin values into discrete intervals.

### Transforming Data Using a Function or Mapping