# Creating Series and Data Frames in Pandas

Pandas provides two types of classes for handling data:

- Series: a one-dimensional labeled array holding data of any type
such as integers, strings, Python objects etc.
- DataFrame: a two-dimensional data structure that holds data like a two-dimension array or a table with rows and columns.

In this recipe, you will learn how to create objects of this types.

Import both `pandas` and `numpy` libraries.

In [1]:
import pandas as pd
import numpy as np

Create an array of a given shape and populate it with random numbers from a uniform distribution over the interval $[0, 1)$.
To do this use the method `rand` from the `numpy.random` module. 

In [2]:
random_numbers = np.random.rand(5)
random_numbers

array([0.87094373, 0.36844403, 0.34383077, 0.82472597, 0.38817593])

In [4]:
type(random_numbers)

numpy.ndarray

## Series

Create a pandas `Series` object  from a `numpy.ndarray` of random numbers.

In [3]:
series = pd.Series(random_numbers)
series

0    0.870944
1    0.368444
2    0.343831
3    0.824726
4    0.388176
dtype: float64

In [4]:
type(series)

pandas.core.series.Series

Create a pandas `Series` object from a `numpy.ndarray` specifying the `index`

In [6]:
series = pd.Series(random_numbers, index=["a", "b", "c", "d", "e"])
series

a    0.235502
b    0.240055
c    0.660849
d    0.524272
e    0.121879
dtype: float64

Create a pandas `Series` from a dictionary. In this case, the keys of the dictionary act as indices.

In [5]:
d = {"b": 1, "a": 0, "c": 2, "d":None}
series = pd.Series(d)
series


b    1.0
a    0.0
c    2.0
d    NaN
dtype: float64

## DataFrames
Create a `DataFrame` from a dictionary containing pandas `Series` objects.

In [6]:
d = {
    "one": pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"]),
    "two": pd.Series([1.0, 2.0, 3.0, 4.0], index=["a", "b", "c", "d"]),
}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


Create a `DataFrame` from a dictionary containing iterable elements such as lists, tuples, or `numpy.ndarrays`

In [7]:
d = {"one": [1.0, 2.0, 3.0, 4.0], "two": [4.0, 3.0, 2.0, 1.0]}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0


In [8]:
d = {"one": (1.0, 2.0, 3.0, 4.0), "two": (4.0, 3.0, 2.0, 1.0)}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
0,1.0,4.0
1,2.0,3.0
2,3.0,2.0
3,4.0,1.0


In [10]:
random_numbers1 = np.random.rand(5)
random_numbers2 = np.random.rand(5)
d = {"one": random_numbers1, "two": random_numbers2}
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
0,0.120951,0.267698
1,0.287852,0.399872
2,0.911299,0.70034
3,0.511067,0.596419
4,0.0422,0.743313


Create a `DataFrame` from a structured or record `numpy.ndarrays`

In [11]:
data = np.random.randn(10, 4)
df = pd.DataFrame(data)
df

Unnamed: 0,0,1,2,3
0,-1.142535,-0.245003,-0.851253,-0.148273
1,0.213844,-1.838625,0.838224,0.669171
2,-1.153421,-1.967507,1.488544,-0.321826
3,-1.839831,0.026743,0.487918,1.432061
4,-1.361442,2.13889,0.512439,-0.411154
5,0.199373,-0.477661,-0.237204,-0.897375
6,-0.994734,-0.704414,0.077252,0.71955
7,2.036445,-0.52458,-0.17567,0.099778
8,-1.389724,0.961666,-1.03605,-0.390783
9,0.213573,1.442763,0.743532,-1.500745


Specify the names of the columns of the columns by passing the input `columns`

In [12]:
data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,-0.685967,-0.226106,-0.155559,-0.859795
1,0.050424,-1.905684,1.042699,-2.392223
2,-0.212897,0.940746,-0.999252,-0.676588
3,1.384519,1.696834,-0.064091,-0.207688
4,-0.802295,1.210706,-0.085844,0.699038
5,-0.23913,-1.992698,0.65811,1.662385
6,0.016091,0.505621,2.105684,-0.561361
7,0.982224,0.401623,-0.596864,-0.88007
8,-0.50254,-0.715308,-0.256149,0.599689
9,-1.372892,-0.478353,-0.24759,0.586483


Specify the index of the `DataFrame` by passing the input `index`

In [15]:
data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=[
                  'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'j', 'i'])
df

Unnamed: 0,A,B,C,D
a,-0.256161,-1.256688,1.317764,-0.401302
b,0.003773,-0.316651,-0.839349,-0.846119
c,1.665411,0.083434,0.577263,0.308118
d,0.692807,-1.980146,0.552804,0.975452
e,0.007482,-0.203093,-0.392248,-0.465112
f,0.070412,-1.198515,-0.126279,0.881929
g,1.177999,0.095791,-0.355332,-0.231344
h,0.839825,-0.341552,0.497682,1.428032
j,0.847012,-0.077953,-0.575886,-2.67973
i,0.869556,0.34415,1.752955,-1.288218


Finally, we will create a range of dates to be used as an index in our `DataFrame`. 
We can do this by simply calling the method `date_range`

In [16]:
dates = pd.date_range(start='1/1/2024', periods=10)
data = np.random.randn(10, 4)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=dates)
df

Unnamed: 0,A,B,C,D
2024-01-01,-0.555331,0.331893,0.396694,0.663127
2024-01-02,-0.052035,0.511458,-1.842102,-0.270067
2024-01-03,0.62055,1.254846,-0.140846,1.051753
2024-01-04,-0.526195,0.743207,1.01248,-0.300316
2024-01-05,0.667526,-0.572744,-0.500352,1.356439
2024-01-06,-0.968069,0.717376,1.141382,0.118077
2024-01-07,1.154459,2.115912,0.876782,-1.018108
2024-01-08,-1.577624,-1.149045,-0.582786,0.006208
2024-01-09,-1.066755,2.213815,0.629357,0.210622
2024-01-10,-1.094261,-0.825516,1.011318,0.172624
