# PyData TLV - Pandas Tips & Tricks

<a id='toc'></a>
### Table of Contents

+ [About](#aboutme)
+ [Getting started](#getstarted)
+ [Quick Exploration & Optimizations](#quick)
 + [.info()](#info)
 + [Avoid object type](#obj)
 + [Categorize columns](#cat)
 + [Impact](#impact)
+ [Filtering and slicing DFs](#slice)
 + [Ordered categories](#orderedcats)
 + [with regex](#regex)
+ [Groupby](#groupby)
 + [Multiple aggfuncs](#mulaggfunc)
 + [Custom aggfuncs](#customaggfuncs)
+ [Date manipulations without tears (hopefully)](#tears)
 + [to_period()](#toperiod)
 + [resample()](#resample)
 + [date_range()](#daterange)
 + [WoW changes](#wow)
+ [What's new in 0.19](#19)
+ [What's to come in 0.2?](#20)
+ [Contact](#contact)
+ [How I prepared the data for the talk](#massage)



<a id='aboutme'></a>
## About

This notebook was created by Alon Nir ([linkedin](https://www.linkedin.com/in/alonnir/), [twitter](https://twitter.com/alonnir), [github](https://github.com/alonnir)) for presentation at the 2nd PyData TLV meetup.<br>
Pandas' great breadth and depth means there's always another nifty tip and trick to discover. Here I share a few I use rather frequently.<br>

**If you have tips and tricks of your own, please share them** with me - I'm always eager to learn.<br> 

Also, feel free to hit me up with a Pandas question of challenge. I don't promise I'll have all the answers, but I can try.



<a id='getstarted'></a>

## Let's get to it!

Our data is about Airbnb listings in Seattle. <br>
Taken from Kaggle ([link](https://www.kaggle.com/airbnb/seattle)) and originally from [Inside Airbnb](http://insideairbnb.com/get-the-data.html).<br>
Data was massaged a little for today's presentation ([see below](#massage)).

In [1]:
import pandas as pd
import numpy as np
import datetime

Make sure you spend some time reading the [read_csv documentation](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) - lots of valuable info there for reading CSVs.

In [2]:
airbnb_df = pd.read_csv('data/airbnb2.csv')

airbnb_df.head()

Unnamed: 0,id,host_location,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,...,price,weekly_price,monthly_price,security_deposit,cleaning_fee,minimum_nights,number_of_reviews,first_review,last_review,date_listed
0,241032,"Seattle, Washington, United States",within a few hours,73,85,f,Queen Anne,3.0,3.0,"['email', 'phone', 'reviews', 'kba']",...,85.0,595.0,2550.0,,,1,207,2011-11-01,2016-01-02,2016-05-30
1,953595,"Seattle, Washington, United States",within an hour,82,77,t,Queen Anne,6.0,6.0,"['email', 'phone', 'facebook', 'linkedin', 're...",...,150.0,1000.0,3000.0,100.0,40.0,2,43,2013-08-19,2015-12-29,2016-06-26
2,3308979,"Seattle, Washington, United States",within a few hours,98,94,f,Queen Anne,2.0,2.0,"['email', 'phone', 'google', 'reviews', 'jumio']",...,975.0,6825.0,29250.0,1000.0,300.0,4,20,2014-07-30,2015-09-03,2016-01-30
3,7421966,"Seattle, Washington, United States",,83,86,f,Queen Anne,1.0,1.0,"['email', 'phone', 'facebook', 'reviews', 'jum...",...,100.0,650.0,2300.0,,,1,0,,,2016-01-02
4,278830,"Seattle, Washington, United States",within an hour,88,79,f,Queen Anne,2.0,2.0,"['email', 'phone', 'facebook', 'reviews', 'kba']",...,450.0,3150.0,13500.0,700.0,125.0,1,38,2012-07-10,2015-10-24,2016-07-30


[Back to top](#toc)

---

<a id='quick'></a>
## Quick Exploration & Optimizations

<a id='info'></a>
### .info()

In [3]:
airbnb_df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3810 entries, 0 to 3809
Data columns (total 31 columns):
id                           3810 non-null int64
host_location                3802 non-null object
host_response_time           3289 non-null object
host_response_rate           3810 non-null int64
host_acceptance_rate         3810 non-null int64
host_is_superhost            3810 non-null object
host_neighbourhood           3513 non-null object
host_listings_count          3808 non-null float64
host_total_listings_count    3808 non-null float64
host_verifications           3810 non-null object
host_has_profile_pic         3808 non-null object
host_identity_verified       3808 non-null object
city                         3810 non-null object
state                        3810 non-null object
property_type                3809 non-null object
room_type                    3810 non-null object
accommodates                 3810 non-null int64
bathrooms                    3794 non-null fl

You can also use .memory_usage(deep=True) to see how much memory each field takes.

In [4]:
airbnb_df.memory_usage(deep=True)

Index                            72
id                            30480
host_location                292369
host_response_time           213555
host_response_rate            30480
host_acceptance_rate          30480
host_is_superhost            175260
host_neighbourhood           204661
host_listings_count           30480
host_total_listings_count     30480
host_verifications           344778
host_has_profile_pic         175232
host_identity_verified       175232
city                         198120
state                        179070
property_type                198739
room_type                    224667
accommodates                  30480
bathrooms                     30480
bedrooms                      30480
beds                          30480
bed_type                     201850
price                         30480
weekly_price                  30480
monthly_price                 30480
security_deposit              30480
cleaning_fee                  30480
minimum_nights              

<a id='obj'></a>
### Avoid object types

#### bools

In [5]:
# Example

airbnb_df['host_is_superhost'].value_counts(dropna=False)

f    3033
t     777
Name: host_is_superhost, dtype: int64

In [6]:
# Trnasform to a boolean
# Pandas makes it so easy

airbnb_df['host_is_superhost'] = airbnb_df['host_is_superhost'] == 't'

In [7]:
airbnb_df['host_is_superhost'].value_counts(dropna=False)

False    3033
True      777
Name: host_is_superhost, dtype: int64

#### dates

In [8]:
# Likewise for dates

for c in ['date_listed', 'last_review', 'first_review']:
    airbnb_df[c] = pd.to_datetime(airbnb_df[c])

# you can also use .astype('datetime64[ns]')

(you can actually use the parse_dates param to parse dates columns when reading the CSV, i.e. <br>
airbnb_df = pd.read_csv('data/airbnb2.csv', parse_dates=['date_listed', 'last_review', 'first_review']) )

<a id='cat'></a>
### Categorize columns

In [9]:
airbnb_df['room_type'].value_counts(dropna=False)

Entire home/apt    2538
Private room       1155
Shared room         117
Name: room_type, dtype: int64

In [10]:
airbnb_df['room_type'] = airbnb_df['room_type'].astype('category')

....and we'll get back to that soon!

<a id='impact'></a>
### Impact

Finally, we'll see our actions brought down the memory usage by considerable %:<br>
(ideally you should obviously optimize every field possible. Another obvious way to save memory is to drop/not import columns you don't need, especially textual ones. In our example, for instance, I completely disregarded the host's email and first and last names).

In [11]:
airbnb_df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3810 entries, 0 to 3809
Data columns (total 31 columns):
id                           3810 non-null int64
host_location                3802 non-null object
host_response_time           3289 non-null object
host_response_rate           3810 non-null int64
host_acceptance_rate         3810 non-null int64
host_is_superhost            3810 non-null bool
host_neighbourhood           3513 non-null object
host_listings_count          3808 non-null float64
host_total_listings_count    3808 non-null float64
host_verifications           3810 non-null object
host_has_profile_pic         3808 non-null object
host_identity_verified       3808 non-null object
city                         3810 non-null object
state                        3810 non-null object
property_type                3809 non-null object
room_type                    3810 non-null category
accommodates                 3810 non-null int64
bathrooms                    3794 non-null fl

[Back to top](#toc)

---

<a id='slice'></a>
## Filtering and slicing DFs

Skipping .loc, .iloc etc. for some cool stuff:

<a id='orderedcats'></a>
### Ordered categories

One of the cool things about using the category type is that you can 'rank' the different values:

In [12]:
airbnb_df['room_type'].value_counts()

Entire home/apt    2538
Private room       1155
Shared room         117
dtype: int64

In [13]:
airbnb_df['room_type'] = airbnb_df['room_type'].cat.set_categories(['Shared room', 'Private room', 'Entire home/apt'], 
                                                                   ordered=True)

In [14]:
print airbnb_df[airbnb_df['room_type']>='Private room'].shape

airbnb_df[airbnb_df['room_type']>='Private room']['room_type'].value_counts()

(3693, 31)


Entire home/apt    2538
Private room       1155
Shared room           0
dtype: int64

<a id='regex'></a>
### With regex

For example, let's say we want to see all columns that tell us something about the host:

In [15]:
airbnb_df.filter(like='host', axis=1).head()

Unnamed: 0,host_location,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified
0,"Seattle, Washington, United States",within a few hours,73,85,False,Queen Anne,3.0,3.0,"['email', 'phone', 'reviews', 'kba']",t,t
1,"Seattle, Washington, United States",within an hour,82,77,True,Queen Anne,6.0,6.0,"['email', 'phone', 'facebook', 'linkedin', 're...",t,t
2,"Seattle, Washington, United States",within a few hours,98,94,False,Queen Anne,2.0,2.0,"['email', 'phone', 'google', 'reviews', 'jumio']",t,t
3,"Seattle, Washington, United States",,83,86,False,Queen Anne,1.0,1.0,"['email', 'phone', 'facebook', 'reviews', 'jum...",t,t
4,"Seattle, Washington, United States",within an hour,88,79,False,Queen Anne,2.0,2.0,"['email', 'phone', 'facebook', 'reviews', 'kba']",t,t


Or if we want to see all fees and prices:

In [16]:
airbnb_df.filter(regex='fee|price|deposit', axis=1).head()

Unnamed: 0,price,weekly_price,monthly_price,security_deposit,cleaning_fee
0,85.0,595.0,2550.0,,
1,150.0,1000.0,3000.0,100.0,40.0
2,975.0,6825.0,29250.0,1000.0,300.0
3,100.0,650.0,2300.0,,
4,450.0,3150.0,13500.0,700.0,125.0


[Back to top](#toc)

---

<a id='groupby'></a>
## Group By

<a id='mulaggfunc'></a>
### Multiple aggfuncs

One of the cool things about groupby is that we can apply a different aggregation function (or multiple functions!) to each measure:

In [17]:
airbnb_df.groupby('property_type').agg({'accommodates':[np.mean, np.median], 'bedrooms':max, 'bathrooms':np.median})

Unnamed: 0_level_0,bedrooms,bathrooms,accommodates,accommodates
Unnamed: 0_level_1,max,median,mean,median
property_type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Apartment,4.0,1.0,3.105572,3.0
Bed & Breakfast,1.0,1.0,2.216216,2.0
Boat,4.0,1.5,3.125,2.0
Bungalow,4.0,1.0,2.846154,2.0
Cabin,2.0,1.0,2.666667,2.0
Camper/RV,1.0,1.0,2.615385,2.0
Chalet,1.0,1.0,2.5,2.5
Condominium,3.0,1.0,3.406593,4.0
Dorm,1.0,4.0,8.0,8.0
House,7.0,1.0,3.647569,3.0


(we can flatten the hierarchical index if we'd like to:)

In [18]:
airbnbgp = airbnb_df.groupby('property_type').agg({'accommodates':[np.mean, np.median], 
                                                   'bedrooms':max, 'bathrooms':np.median})

# I'm not familiar with a better way to flatten the multi-index on headers. 
# Doing this explicitly below for sake of presentation.

top = list(airbnbgp.columns.get_level_values(0))
bottom = list(airbnbgp.columns.get_level_values(1))
flat = [top[i]+'_'+bottom[i] for i in range(len(top))]
airbnbgp.columns = flat

airbnbgp

Unnamed: 0_level_0,bedrooms_max,bathrooms_median,accommodates_mean,accommodates_median
property_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Apartment,4.0,1.0,3.105572,3.0
Bed & Breakfast,1.0,1.0,2.216216,2.0
Boat,4.0,1.5,3.125,2.0
Bungalow,4.0,1.0,2.846154,2.0
Cabin,2.0,1.0,2.666667,2.0
Camper/RV,1.0,1.0,2.615385,2.0
Chalet,1.0,1.0,2.5,2.5
Condominium,3.0,1.0,3.406593,4.0
Dorm,1.0,4.0,8.0,8.0
House,7.0,1.0,3.647569,3.0


<a id='customaggfuncs'></a>
### Custom aggfuncs

What's _really_ cool is that you can use a custom aggfunc. For example:

In [19]:
airbnb_df.groupby('property_type')['bed_type'].agg(lambda x: (x=='Real Bed').sum()*0.75)

property_type
Apartment          1213.50
Bed & Breakfast      27.00
Boat                  5.25
Bungalow              9.75
Cabin                14.25
Camper/RV             9.00
Chalet                1.50
Condominium          66.75
Dorm                  1.50
House              1254.75
Loft                 28.50
Other                15.00
Tent                  1.50
Townhouse            84.75
Treehouse             2.25
Yurt                  0.75
Name: bed_type, dtype: float64

[Back to top](#toc)

---

<a id='tears'></a>
## Date manipulations without tears (hopefully)

<a id='toperiod'></a>
### to_period()

Let's add the month-year of listing

In [20]:
airbnb_df['listed_year_month'] = airbnb_df['date_listed'].dt.to_period('M')

In [21]:
airbnb_df.loc[0:5, ['date_listed', 'listed_year_month']]

Unnamed: 0,date_listed,listed_year_month
0,2016-05-30,2016-05
1,2016-06-26,2016-06
2,2016-01-30,2016-01
3,2016-01-02,2016-01
4,2016-07-30,2016-07
5,2016-02-06,2016-02


What if we want to know what day of the week the listing was added?

In [22]:
airbnb_df['day_of_week'] = airbnb_df['date_listed'].dt.dayofweek  # Number, Monday == 0

airbnb_df.loc[0:5, ['date_listed', 'listed_year_month', 'day_of_week']]

Unnamed: 0,date_listed,listed_year_month,day_of_week
0,2016-05-30,2016-05,0
1,2016-06-26,2016-06,6
2,2016-01-30,2016-01,5
3,2016-01-02,2016-01,5
4,2016-07-30,2016-07,5
5,2016-02-06,2016-02,5


<a id='resample'></a>
## resample()

Let's look at some aggregate numbers:

In [23]:
# Number of new listings per day

pt = pd.pivot_table(airbnb_df, index='date_listed', values='id', aggfunc=[len])

pt.rename(columns={'len':'num_listings'}, inplace=True)

pt.head(10)

Unnamed: 0_level_0,num_listings
date_listed,Unnamed: 1_level_1
2016-01-01,3
2016-01-02,13
2016-01-03,10
2016-01-04,8
2016-01-05,8
2016-01-06,8
2016-01-07,16
2016-01-08,17
2016-01-09,6
2016-01-10,11


Now let's say we want to look at the data weekly instead of daily: <br>
(there are [so many](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) options)

In [24]:
pt_weekly = pt.resample('W').sum()

pt_weekly.head(10)

Unnamed: 0_level_0,num_listings
date_listed,Unnamed: 1_level_1
2016-01-03,26
2016-01-10,74
2016-01-17,60
2016-01-24,68
2016-01-31,76
2016-02-07,85
2016-02-14,85
2016-02-21,46
2016-02-28,66
2016-03-06,68


<a id='daterange'></a>
### date_range()

Now let's say some of the data was corrupted due to some technical error.

In [25]:
pt_weekly.drop(pt_weekly.index[[3, 5, 8]], inplace=True)
pt_weekly.head(7)

Unnamed: 0_level_0,num_listings
date_listed,Unnamed: 1_level_1
2016-01-03,26
2016-01-10,74
2016-01-17,60
2016-01-31,76
2016-02-14,85
2016-02-21,46
2016-03-06,68


We'll use date_range to generate all the dates we'd expect to see.

In [26]:
weeks = pd.date_range(start=datetime.date(2016,1,1),
                      end=datetime.datetime(2016,12,31),
                      freq='W')

weeks

DatetimeIndex(['2016-01-03', '2016-01-10', '2016-01-17', '2016-01-24',
               '2016-01-31', '2016-02-07', '2016-02-14', '2016-02-21',
               '2016-02-28', '2016-03-06', '2016-03-13', '2016-03-20',
               '2016-03-27', '2016-04-03', '2016-04-10', '2016-04-17',
               '2016-04-24', '2016-05-01', '2016-05-08', '2016-05-15',
               '2016-05-22', '2016-05-29', '2016-06-05', '2016-06-12',
               '2016-06-19', '2016-06-26', '2016-07-03', '2016-07-10',
               '2016-07-17', '2016-07-24', '2016-07-31', '2016-08-07',
               '2016-08-14', '2016-08-21', '2016-08-28', '2016-09-04',
               '2016-09-11', '2016-09-18', '2016-09-25', '2016-10-02',
               '2016-10-09', '2016-10-16', '2016-10-23', '2016-10-30',
               '2016-11-06', '2016-11-13', '2016-11-20', '2016-11-27',
               '2016-12-04', '2016-12-11', '2016-12-18', '2016-12-25'],
              dtype='datetime64[ns]', freq='W-SUN')

And we'll add blank rows for the missing periods:

In [27]:
print '%d rows in the DataFrame before adding rows for missing weeks.' % pt_weekly.shape[0]

for w in list(weeks):
    if w not in list(pt_weekly.index):
        print 'Adding', w
        pt_weekly.loc[w] = None
        
print '%d rows in the DataFrame after adding missing rows.' % pt_weekly.shape[0]

50 rows in the DataFrame before adding rows for missing weeks.
Adding 2016-01-24 00:00:00
Adding 2016-02-07 00:00:00
Adding 2016-02-28 00:00:00
53 rows in the DataFrame after adding missing rows.


In [28]:
pt_weekly.sort_index(inplace=True)

pt_weekly.head(10)

Unnamed: 0,num_listings
2016-01-03,26.0
2016-01-10,74.0
2016-01-17,60.0
2016-01-24,
2016-01-31,76.0
2016-02-07,
2016-02-14,85.0
2016-02-21,46.0
2016-02-28,
2016-03-06,68.0


Now what do we do about the missing values?<br>
Easy solutions: back fill (bfill: **next** valid value) or forward fill (ffill: **previous** valid value).<br>
We can (and in many cases should) also think of more sophisticated methods like a mean of the two closest values in time or fitting to a trend, but that's out of today's scope.

In [29]:
pt_weekly.fillna(method='bfill', inplace=True)
pt_weekly.head(10)

Unnamed: 0,num_listings
2016-01-03,26.0
2016-01-10,74.0
2016-01-17,60.0
2016-01-24,76.0
2016-01-31,76.0
2016-02-07,85.0
2016-02-14,85.0
2016-02-21,46.0
2016-02-28,68.0
2016-03-06,68.0


<a id='wow'></a>
### WoW changes

We can use _diff_ or _shift_ to see how numbers change row to row (or several rows). For example:

In [30]:
pt_weekly['WoW'] = pt_weekly['num_listings'].diff(periods=1)

# Alternatively:
#pt_weekly['WoW'] = pt_weekly['num_listings'] - pt_weekly['num_listings'].shift(1)

pt_weekly.head(10)

Unnamed: 0,num_listings,WoW
2016-01-03,26.0,
2016-01-10,74.0,48.0
2016-01-17,60.0,-14.0
2016-01-24,76.0,16.0
2016-01-31,76.0,0.0
2016-02-07,85.0,9.0
2016-02-14,85.0,0.0
2016-02-21,46.0,-39.0
2016-02-28,68.0,22.0
2016-03-06,68.0,0.0


[Back to top](#toc)

---

<a id='19'></a>
## What's new in 0.19

From the [documentation](http://pandas.pydata.org/pandas-docs/version/0.19.0/whatsnew.html):

+ merge_asof() for asof-style time-series joining
+ .rolling() is now time-series aware
+ read_csv() now supports parsing Categorical data
+ A function union_categorical() has been added for combining categoricals
+ PeriodIndex now has its own period dtype, and changed to be more consistent with other Index classes. 
+ Sparse data structures gained enhanced support of int and bool dtypes
+ Comparison operations with Series no longer ignores the index, see here for an overview of the API changes.
+ Introduction of a pandas development API for utility functions
+ Deprecation of Panel4D and PanelND. We recommend to represent these types of n-dimensional data with the xarray package.
+ Removal of the previously deprecated modules pandas.io.data, pandas.io.wb, pandas.tools.rplot.


<a id='20'></a>
## What's to come in 0.2?
From the [documentation](http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#v0-20-0-2017):

+ New features
 + dtype keyword for data IO
 + Groupby Enhancements
 + Better support for compressed URLs in read_csv
 + UInt64 Support Improved
 + Other enhancements

+ Backwards incompatible API changes
 + **Deprecate .ix**
 + Map on Index types now return other Index types
 + S3 File Handling
 + Partial String Indexing Changes
 + Memory Usage for Index is more Accurate
 + Groupby Describe Formatting
 + Other API Changes
 + Deprecations
 + Removal of prior version deprecations/changes


[Back to top](#toc)

---

<a id='contact'></a>
## Contact
For your tips and tricks, questions and comments, feel free to contact me in any of the methods mentioned [above](#aboutme).

---

# Appendix

<a id='massage'></a>
## How I prepared the data for the talk

In [None]:
# Original data set can be found under data/listings.csv. I inspected it and decided to focus on the following cols:

cols = ['id', 'host_location', 'city', 'state', 'host_response_time', 'host_response_rate', 'host_acceptance_rate', 
        'host_is_superhost', 'host_neighbourhood', 'host_listings_count', 'host_total_listings_count', 
        'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'property_type', 'room_type', 
        'accommodates', 'bathrooms', 'bedrooms', 'beds', 'bed_type', 'price', 'weekly_price', 'monthly_price', 
        'security_deposit', 'cleaning_fee', 'minimum_nights', 'number_of_reviews', 'first_review', 'last_review']

In [None]:
airbnb_df = pd.read_csv('data/listings.csv', usecols=cols)

In [None]:
# Reformatting some of the prices from '$xx.xx' to xx.xx:

def remove_ds(x):
    try:
        return float(x.translate(None, ',$'))
    except:
        return None

for c in ['price', 'security_deposit', 'cleaning_fee', 'weekly_price', 'monthly_price']:    
    airbnb_df[c] = airbnb_df[c].apply(remove_ds)

In [None]:
# fillna
airbnb_df['weekly_price'].fillna(airbnb_df['price']*7, inplace=True)
airbnb_df['monthly_price'].fillna(airbnb_df['price']*30, inplace=True)

In [None]:
# Removing some 'scraping debris'
airbnb_df = airbnb_df[airbnb_df['city']=='Seattle']

airbnb_df['city'].value_counts(dropna=False)

In [None]:
# For convenience replaced the acceptance and response rates, which were mostly high 90s or NaNs with random values.

airbnb_df['host_acceptance_rate'] = np.random.randint(70,100, size=(airbnb_df.shape[0],1))

airbnb_df['host_response_rate'] = np.random.randint(70,100, size=(airbnb_df.shape[0],1))

In [None]:
# fillna 

airbnb_df['host_is_superhost'].fillna('f', inplace=True)

In [None]:
import datetime
import random

# For east of presentation, replaced the listing_date with random dates, all in 2016.


start_date = datetime.date(2016, 1, 1).toordinal()
end_date = datetime.datetime(2016, 12, 31).toordinal()

random_day = datetime.date.fromordinal(random.randint(start_date, end_date))

rand_dates = []
for i in range(airbnb_df.shape[0]):
    rand_dates.append(datetime.date.fromordinal(random.randint(start_date, end_date)))
    
airbnb_df['date_listed'] = rand_dates

In [None]:
airbnb_df.to_csv('data/airbnb2.csv', index=False)