In [1]:
import pandas as pd

## Read From CSV File

In [2]:
df = pd.read_csv("data/parks.csv")

In [3]:
df.head()

Unnamed: 0,Park Code,Park Name,State,Acres,Latitude,Longitude
0,ACAD,Acadia National Park,ME,47390,44.35,-68.21
1,LKAD,NULL National Park,,47390,,-68.21
2,ARCH,Arches National Park,UT,76519,38.68,-109.57
3,BADL,Badlands National Park,SD,242756,43.75,-102.5
4,BIBE,Big Bend National Park,TX,801163,29.25,-103.25


## Slicing the Data Frame

In [4]:
X = df [2:4]

In [5]:
X

Unnamed: 0,Park Code,Park Name,State,Acres,Latitude,Longitude
2,ARCH,Arches National Park,UT,76519,38.68,-109.57
3,BADL,Badlands National Park,SD,242756,43.75,-102.5


## Indexing Columns

In [6]:
df['State'].head()

0     ME
1    NaN
2     UT
3     SD
4     TX
Name: State, dtype: object

In [7]:
df.State.head()

0     ME
1    NaN
2     UT
3     SD
4     TX
Name: State, dtype: object

We can only access the 'Park Code' column by passing its name as a string in brackets, like df['Park Code']. I recommend either always using that approach or always converting your column names into a valid format as soon as you read in the data so that you don't have to mix the two methods. It's just a bit tidier.

It's a good practice to clean your column names to prevent this sort of error. I'll use a very short cleaning function here since the names don't have any odd characters. By convention, the names should also be converted to lower case. Pandas is case sensitive, so future calls to all of the columns will need to be updated.

In [8]:
df.columns = [col.replace(' ', '_').lower() for col in df.columns]
print(df.columns)

Index(['park_code', 'park_name', 'state', 'acres', 'latitude', 'longitude'], dtype='object')


## Indexing Columns & Rows

In [9]:
df[['state', 'acres']][:3]

Unnamed: 0,state,acres
0,ME,47390
1,,47390
2,UT,76519


## Selecting Subset Of Data

In [10]:
(df.state == 'UT').head(3)

0    False
1    False
2     True
Name: state, dtype: bool

We get a series of the results of the boolean. Passing that series into a dataframe gives us the subset of the dataframe where the boolean evaluates to True.

In [11]:
df[df.state == 'UT']

Unnamed: 0,park_code,park_name,state,acres,latitude,longitude
2,ARCH,Arches National Park,UT,76519,38.68,-109.57
7,BRCA,Bryce Canyon National Park,UT,35835,37.57,-112.18
8,CANY,Canyonlands National Park,UT,337598,38.2,-109.93
9,CARE,Capitol Reef National Park,UT,241904,38.2,-111.17
56,ZION,Zion National Park,UT,146598,37.3,-113.05


In [12]:
df[(df.latitude > 60) | (df.acres > 10**6)].head(3)

Unnamed: 0,park_code,park_name,state,acres,latitude,longitude
15,DENA,Denali National Park and Preserve,AK,3372402,63.33,-150.5
16,DEVA,Death Valley National Park,"CA, NV",4740912,36.24,-116.82
18,EVER,Everglades National Park,FL,1508538,25.32,-80.93


In [13]:
df.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,47,48,49,50,51,52,53,54,55,56
park_code,ACAD,LKAD,ARCH,BADL,BIBE,BISC,BLCA,BRCA,CANY,CARE,...,SAGU,SEKI,SHEN,THRO,VOYA,WICA,WRST,YELL,YOSE,ZION
park_name,Acadia National Park,NULL National Park,Arches National Park,Badlands National Park,Big Bend National Park,Biscayne National Park,Black Canyon of the Gunnison National Park,Bryce Canyon National Park,Canyonlands National Park,Capitol Reef National Park,...,Saguaro National Park,Sequoia and Kings Canyon National Parks,Shenandoah National Park,Theodore Roosevelt National Park,Voyageurs National Park,Wind Cave National Park,Wrangell - St Elias National Park and Preserve,Yellowstone National Park,Yosemite National Park,Zion National Park
state,ME,,UT,SD,TX,FL,CO,UT,UT,UT,...,AZ,CA,VA,ND,MN,SD,AK,"WY, MT, ID",CA,UT
acres,47390,47390,76519,242756,801163,172924,32950,35835,337598,241904,...,91440,865952,199045,70447,218200,28295,8323148,2219791,761266,146598
latitude,44.35,,38.68,43.75,29.25,25.65,38.57,37.57,38.2,38.2,...,32.25,36.43,38.53,46.97,48.5,43.57,61,44.6,37.83,37.3
longitude,-68.21,-68.21,-109.57,-102.5,-103.25,-80.08,-107.72,-112.18,-109.93,-111.17,...,-110.5,-118.68,-78.35,-103.45,-92.88,-103.48,-142,-110.5,-119.5,-113.05


## Identifiying NULL values in Columns

In [14]:
df[df.state.isnull()]

Unnamed: 0,park_code,park_name,state,acres,latitude,longitude
1,LKAD,NULL National Park,,47390,,-68.21


## Dropping Null Values

In [15]:
df.dropna()

Unnamed: 0,park_code,park_name,state,acres,latitude,longitude
0,ACAD,Acadia National Park,ME,47390,44.35,-68.21
2,ARCH,Arches National Park,UT,76519,38.68,-109.57
3,BADL,Badlands National Park,SD,242756,43.75,-102.5
4,BIBE,Big Bend National Park,TX,801163,29.25,-103.25
5,BISC,Biscayne National Park,FL,172924,25.65,-80.08
6,BLCA,Black Canyon of the Gunnison National Park,CO,32950,38.57,-107.72
7,BRCA,Bryce Canyon National Park,UT,35835,37.57,-112.18
8,CANY,Canyonlands National Park,UT,337598,38.2,-109.93
9,CARE,Capitol Reef National Park,UT,241904,38.2,-111.17
10,CAVE,Carlsbad Caverns National Park,NM,46766,32.17,-104.44
