In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({
    "Date": [
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
    "Data": [5, 8, 6, 1, 50, 100, 60, 120],
})

In [3]:
def makechat(row):
    row['chat'] = f"{row.Date} is the date"
    return row

In [5]:
df.transform(makechat, axis=1)

ValueError: Function did not transform

# Series
Can be considered as a type of dictionary, or a generalised 1D NumPy array.

In [7]:
data = pd.Series([0.25, 0.5, 0.375, 1.0], index = [2,5,3,7])

In [8]:
data[5]

0.5

Series(data, index). data can be a scalar (which is then repeated as many times as there are indices), a list or numpy array, in which case the index defaults to an integer (0-based) sequence, or a dictionary, in which case index defaults to the dictionary keys.

In [9]:
pd.Series([2,4,6])

0    2
1    4
2    6
dtype: int64

In [10]:
pd.Series(5, index=[100, 200, 300])

100    5
200    5
300    5
dtype: int64

In [12]:
s = pd.Series({2:'a', 1:'b', 3:'c'})

In [13]:
s[1]

'b'

In each case the index can be explitly set to control the order, or the subset of keys used. eg here we don't use everything in the dictionary and we specify the order.

In [14]:
pd.Series({2:'a', 1:'b', 3:'c'}, index=[1,2])

1    b
2    a
dtype: object

# DataFrames
Can be considered as either a specialization of a dictionary or a generalization of a NumPy array, or as a sequence of aligned Series. 'aligned' means sharing the same index.

In [19]:
population_dict = {'California': 39538223, 'Texas': 29145505,
                            'Florida': 21538187, 'New York': 20201249,
                            'Pennsylvania': 13002700}
population = pd.Series(population_dict)
population

California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64

In [17]:
area_dict = {'California': 423967, 'Texas': 695662, 'Florida': 170312,
                      'New York': 141297, 'Pennsylvania': 119280}
area = pd.Series(area_dict)
area

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
dtype: int64

In [21]:
states = pd.DataFrame({'population': population, 'area': area})
states

Unnamed: 0,population,area
California,39538223,423967
Texas,29145505,695662
Florida,21538187,170312
New York,20201249,141297
Pennsylvania,13002700,119280


Dataframe can be thought of as a generalisation of a 2D NumPy array, where both the rows and columns have a generalised index for accessing the data.

In [23]:
states.columns

Index(['population', 'area'], dtype='object')

In [22]:
states.index

Index(['California', 'Texas', 'Florida', 'New York', 'Pennsylvania'], dtype='object')

In [24]:
states['area']

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

For a 2d `NumPy` array, `data[0]` will return the first *row*. For a `DataFrame`, `data['col0']` will return the first *column*

In [26]:
# from a list of dicts
data = [{'a': i, 'b': 2*i} for i in range(3)]
pd.DataFrame(data)

Unnamed: 0,a,b
0,0,0
1,1,2
2,2,4
