# Introduction to pandas

The most popular Python package for manipulating and analyzing data is [pandas](https:pandas.pydata.org). In particular, it centers on the series–data frame paradigm. 

For example, here is a series for the wavelengths of light corresponding to rainbow colors.

In [13]:
import pandas as pd
wave = pd.Series([400,470,520,580,610,710],index=["violet","blue","green","yellow","orange","red"])
print(wave)

violet    400
blue      470
green     520
yellow    580
orange    610
red       710
dtype: int64


We can now use an index value to access a value in the series.

In [14]:
print(wave["blue"])

470


We can access multiple values to get a series that is a subset of the original.

In [15]:
print(wave[["violet","red"]])

violet    400
red       710
dtype: int64


Here is a series of NFL teams based on the same index.

In [16]:
teams = pd.Series(["Vikings","Bills","Eagles","Chargers","Bengals","Cardinals"],index=wave.index)
print(teams["green"])

Eagles


Now we can create a data frame using these two series as columns.

In [22]:
rainbow = pd.DataFrame({"wavelength":wave,"team name":teams})
print(rainbow)

        wavelength  team name
violet         400    Vikings
blue           470      Bills
green          520     Eagles
yellow         580   Chargers
orange         610    Bengals
red            710  Cardinals


We can add a column after the fact just by giving the values. The indexing is inherited from the current frame.

In [23]:
rainbow["flower"] = ["Lobelia","Cornflower","Bells-of-Ireland","Daffodil","Butterfly weed","Rose"]
print(rainbow)

        wavelength  team name            flower
violet         400    Vikings           Lobelia
blue           470      Bills        Cornflower
green          520     Eagles  Bells-of-Ireland
yellow         580   Chargers          Daffodil
orange         610    Bengals    Butterfly weed
red            710  Cardinals              Rose


Interestingly, a row of the data frame (accessed using `loc` below) is itself a series, indexed by the column names.

In [24]:
print(rainbow.loc["orange"])

wavelength               610
team name            Bengals
flower        Butterfly weed
Name: orange, dtype: object


There are many ways to specify values for a data frame. If no explicit index set is given, then consecutive integers starting at zero are used.

In [29]:
letters = pd.DataFrame([("a","A"),("b","B"),("c","C")],columns=["lowercase","uppercase"])
print(letters)

  lowercase uppercase
0         a         A
1         b         B
2         c         C


In [34]:
print(letters.loc[1])

lowercase    b
uppercase    B
Name: 1, dtype: object


In [32]:
print(letters["uppercase"])

0    A
1    B
2    C
Name: uppercase, dtype: object


For much more about pandas fundamentals, try the [Kaggle course](https://www.kaggle.com/learn/pandas).