# Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

https://discuss.analyticsvidhya.com/t/what-is-the-difference-between-pandas-series-and-python-lists/27373/2

In [2]:
import numpy as np
import pandas as pd
my_numpy_array = np.random.rand(3)
my_first_series = pd.Series(my_numpy_array)
my_first_series

0    0.816826
1    0.227815
2    0.958303
dtype: float64

Series like a Dictionary. You can actually control the index(label) of elements. In this case the Series label can be thought of similar to python dictionary.

In [5]:
my_series = pd.Series(my_numpy_array, index = ["one", "two", "three"])
my_series["one"]

0.3158976234069526

In [6]:
my_series

one      0.315898
two      0.915814
three    0.716966
dtype: float64

In [7]:
my_series[0]

0.3158976234069526

In [8]:
my_series.index

Index(['one', 'two', 'three'], dtype='object')

# Dataframes

In [9]:
my_first_dataframe = pd.DataFrame(np.random.rand(3, 2))

In [10]:
my_first_dataframe

Unnamed: 0,0,1
0,0.139218,0.977589
1,0.299094,0.912
2,0.66242,0.468928


In [11]:
array_2d = np.random.rand(3,2)

In [12]:
array_2d

array([[0.75793389, 0.40151499],
       [0.6915229 , 0.03914398],
       [0.29954537, 0.82372595]])

In [13]:
array_2d[0,1]

0.401514991674881

In [30]:
df = pd.DataFrame(array_2d)

In [15]:
df

Unnamed: 0,0,1
0,0.757934,0.401515
1,0.691523,0.039144
2,0.299545,0.823726


this type of indexing does not work with dataframes

In [18]:
df[0, 1]

KeyError: (0, 1)

In [31]:
df.columns

RangeIndex(start=0, stop=2, step=1)

In [32]:
df.columns = ["first", "second"]

In [21]:
df

Unnamed: 0,first,second
0,0.757934,0.401515
1,0.691523,0.039144
2,0.299545,0.823726


In [22]:
df["second"]

0    0.401515
1    0.039144
2    0.823726
Name: second, dtype: float64

note : the above is a series

# Loading different types of data into Dataframes

Consider the situation wherein data is in the form of Json, with each record in a separate file. we will have to read the objects from each file and load it into memory using python and then use it in pandas.
pandas provides several APIS of the form pd.DataFrame.from_* for this

In [4]:
records = [("a",1), ("b", 2)]
pd.DataFrame.from_records(records)

Unnamed: 0,0,1
0,a,1
1,b,2


In [5]:
pd.DataFrame.from_records(records, columns =["x","y"])

Unnamed: 0,x,y
0,a,1
1,b,2


## Creating a dataframe directly

In [7]:
c = pd.DataFrame(
    [[1, 'a'],
    [2, 'b']]
    )
c

Unnamed: 0,0,1
0,1,a
1,2,b


In [8]:
c = pd.DataFrame(
    [[1, 'a'],
    [2, 'b']],
    index = ['A','B'],
    columns = ['Id', 'Name']
    )
c

Unnamed: 0,Id,Name
A,1,a
B,2,b
