**When reading from a file, how do I read in only a subset of the columns?**

In [1]:
import pandas as pd

In [2]:
ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.head(3)

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


In [3]:
type(ufo)

pandas.core.frame.DataFrame

In [5]:
ufo.dtypes

City               object
Colors Reported    object
Shape Reported     object
State              object
Time               object
dtype: object

In [6]:
ufo.ndim

2

In [7]:
ufo.shape

(18241, 5)

In [8]:
ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

**Method:**1

In [15]:
# specify which columns to include by name
ufo = pd.read_csv("http://bit.ly/uforeports", usecols=['City','State'])
ufo.head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [21]:
ufo.columns

Index(['City', 'State'], dtype='object')

In [19]:
# or equivalently, specify columns by position
ufo = pd.read_csv("http://bit.ly/uforeports", usecols=[0,3])
ufo.head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [20]:
ufo.columns

Index(['City', 'State'], dtype='object')

**When reading from a file, how do I read in only a subset of the rows?**

In [23]:
# specify how many rows to read
ufo = pd.read_csv("http://bit.ly/uforeports", nrows=3)
ufo

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


Documentation for [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

**How do I iterate through a Series?**

In [24]:
for c in ufo.City:
    print(c)

Ithaca
Willingboro
Holyoke


 **How do I iterate through a DataFrame?**

In [25]:
# various methods are available to iterate through a DataFrame
for index, row in ufo.iterrows():
    print(index, row.City, row.State)

0 Ithaca NY
1 Willingboro NJ
2 Holyoke CO


Documentation for [iterrows](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html)

**How do I drop all non-numeric columns from a DataFrame?**

In [26]:
# read a dataset of alcohol consumption into a DataFrame, and check the data types
drink = pd.read_csv('http://bit.ly/drinksbycountry')
drink.columns

Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

In [27]:
drink.dtypes

country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

In [28]:
drink.shape

(193, 6)

In [29]:
drink.ndim

2

In [31]:
# only include numeric columns in the DataFrame
import numpy as np

drink.select_dtypes(include=[np.number]).head()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
0,0,0,0,0.0
1,89,132,54,4.9
2,25,0,14,0.7
3,245,138,312,12.4
4,217,57,45,5.9


In [32]:
drink.select_dtypes(include=[np.number]).dtypes

beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
dtype: object

Documentation for [select_dtypes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html)

How do I know whether I should pass an argument as a string or a list?

In [33]:
drink.describe() # will describe all of the numeric columns

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


In [37]:
# pass the string 'all' to describe all columns
drink.describe(include='all').head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
count,193,193.0,193.0,193.0,193.0,193
unique,193,,,,,6
top,Afghanistan,,,,,Africa
freq,1,,,,,53
mean,,106.160622,80.994819,49.450777,4.717098,


In [38]:
# pass a list of data types to only describe certain types
drink.describe(include=['object', 'float64'])

Unnamed: 0,country,total_litres_of_pure_alcohol,continent
count,193,193.0,193
unique,193,,6
top,Afghanistan,,Africa
freq,1,,53
mean,,4.717098,
std,,3.773298,
min,,0.0,
25%,,1.3,
50%,,4.2,
75%,,7.2,


In [42]:
# pass a list even if you only want to describe a single data type
drink.describe(include=['object'])

Unnamed: 0,country,continent
count,193,193
unique,193,6
top,Afghanistan,Africa
freq,1,53


Documentation for [describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html)