**Question:** When reading from a file, how do I read in only a subset of the columns

In [2]:
import pandas as pd

ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

In [4]:
# specify which columns to include by name
ufo = pd.read_csv('http://bit.ly/uforeports', usecols=['City', 'State'])

# or equivalently, specify columns by position
ufo = pd.read_csv('http://bit.ly/uforeports', usecols=[0, 4])
ufo.columns

Index(['City', 'Time'], dtype='object')

In [5]:
ufo.shape

(18241, 2)

**Question**: When reading from a file, how do I read in only a subset of the rows?

In [8]:
# specify how many rows to read
ufo = pd.read_csv('http://bit.ly/uforeports', nrows=3)
ufo

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


Documentation for [**`read_csv`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)

**Question:** How do I iterate through a Series?

In [18]:
# Series are directly iterable (like a list)
for c in ufo.City:
    print(c)

print("\n\n")

for index,val in enumerate(ufo.City):    # behaving like numpy array
    print(index,val)
    
print("\n\n")
    
for index,val in ufo.City.iteritems():   # behaving like a dictionary
    print(index,val)

Ithaca
Willingboro
Holyoke



0 Ithaca
1 Willingboro
2 Holyoke



0 Ithaca
1 Willingboro
2 Holyoke


**Question**: How do I iterate through a DataFrame?

In [23]:
# various methods are available to iterate through a DataFrame
for index,row in ufo.iterrows():
    print(index, row.City, row.State)

0 Ithaca NY
1 Willingboro NJ
2 Holyoke CO


Documentation for [**`iterrows`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html)

**Question:** How do I drop all non-numeric columns from a DataFrame?

In [None]:
# read a dataset of alcohol consumption into a DataFrame, and check the data types
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.dtypes

In [25]:
# only include numeric columns in the DataFrame
import numpy as np
drinks.select_dtypes(include=[np.number]).dtypes

beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
dtype: object

Documentation for [**`select_dtypes`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html)

**Question:** How do I know whether I should pass an argument as a string or a list?

In [26]:
# describe all of the numeric columns
drinks.describe()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


In [27]:
# pass the string 'all' to describe all columns
drinks.describe(include='all')

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
count,193,193.0,193.0,193.0,193.0,193
unique,193,,,,,6
top,Samoa,,,,,Africa
freq,1,,,,,53
mean,,106.160622,80.994819,49.450777,4.717098,
std,,101.143103,88.284312,79.697598,3.773298,
min,,0.0,0.0,0.0,0.0,
25%,,20.0,4.0,1.0,1.3,
50%,,76.0,56.0,8.0,4.2,
75%,,188.0,128.0,59.0,7.2,


In [28]:
# pass a list of data types to only describe certain types
drinks.describe(include=['object', 'float64'])

Unnamed: 0,country,total_litres_of_pure_alcohol,continent
count,193,193.0,193
unique,193,,6
top,Samoa,,Africa
freq,1,,53
mean,,4.717098,
std,,3.773298,
min,,0.0,
25%,,1.3,
50%,,4.2,
75%,,7.2,


In [29]:
# pass a list even if you only want to describe a single data type
drinks.describe(include=['object'])

Unnamed: 0,country,continent
count,193,193
unique,193,6
top,Samoa,Africa
freq,1,53


Documentation for [**`describe`**](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html)

In [None]:
# END