https://www.youtube.com/watch?v=-Ov1N1_FbP8&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=29

# How do I create a pandas DataFrame from another object?

In [3]:
import pandas as pd

In [4]:
# create a DataFrame from a dictionary (keys become column names, values become data)
pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']})

Unnamed: 0,id,color
0,100,red
1,101,blue
2,102,red


In [5]:
# optionally specify the order of columns and define the index
df = pd.DataFrame({'id':[100, 101, 102], 'color':['red', 'blue', 'red']}, columns=['id', 'color'], index=['a', 'b', 'c'])
df

Unnamed: 0,id,color
a,100,red
b,101,blue
c,102,red


Documentation for **DataFrame** https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

In [14]:
df

Unnamed: 0,id,color
a,100,red
b,101,blue
c,102,red


In [15]:
# create a DataFrame from a list of lists (each inner list becomes a row)
pd.DataFrame([[100, 'red'], [101, 'blue'], [102, 'red']], columns=['id', 'color'])

Unnamed: 0,id,color
0,100,red
1,101,blue
2,102,red


In [16]:
# create a NumPy array (with shape 4 by 2) and fill it with random numbers between 0 and 1
import numpy as np
arr = np.random.rand(4, 2)
arr

array([[0.68119522, 0.20745331],
       [0.69286037, 0.19209233],
       [0.89239521, 0.88730251],
       [0.72986619, 0.33292876]])

In [17]:
# create a DataFrame from the NumPy array
pd.DataFrame(arr, columns=['one', 'two'])

Unnamed: 0,one,two
0,0.681195,0.207453
1,0.69286,0.192092
2,0.892395,0.887303
3,0.729866,0.332929


In [98]:
# create a DataFrame of student IDs (100 through 109) and test scores (random integers between 60 and 100)
pd.DataFrame({'student':np.arange(100, 110, 1), 'test':np.random.randint(60, 101, 10)})

Unnamed: 0,student,test
0,100,71
1,101,75
2,102,76
3,103,76
4,104,76
5,105,70
6,106,76
7,107,67
8,108,70
9,109,94


Documentation for **np.arange** and **np.random**

https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html

https://docs.scipy.org/doc/numpy/reference/routines.random.html

In [19]:
# 'set_index' can be chained with the DataFrame constructor to select an index
pd.DataFrame({'student':np.arange(100, 110, 1), 'test':np.random.randint(60, 101, 10)}).set_index('student')

Unnamed: 0_level_0,test
student,Unnamed: 1_level_1
100,61
101,79
102,79
103,80
104,96
105,84
106,82
107,91
108,100
109,86


Documentation for **set_index**

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html

In [23]:
# create a new Series using the Series constructor
s = pd.Series(['round', 'square'], index=['c', 'b'], name='shape')
s

c     round
b    square
Name: shape, dtype: object

    c     round
    b    square
    Name: shape, dtype: object

Documentation for **Series**

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

In [24]:
# concatenate the DataFrame and the Series (use axis=1 to concatenate columns)
pd.concat([df, s], axis=1)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


Unnamed: 0,id,color,shape
a,100,red,
b,101,blue,square
c,102,red,round


Notes:

    The Series name became the column name in the DataFrame.
    The Series data was aligned to the DataFrame by its index.
    The 'shape' for row 'a' was marked as a missing value (NaN) because that index was not present in the Series.
Documentation for **concat**

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html