In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # equivalent to from matplotlib import pyplot as plt

Creating a Series (1D array) by passing a list of values, letting pandas create a default RangeIndex.
s becomes a Pandas Series of length 6. By default, the **index** will be 0, 1, 2, 3, 4, 5.

Entry at position 3 is “missing” because of np.nan.

In [3]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

Unnamed: 0,0
0,1.0
1,3.0
2,5.0
3,
4,6.0
5,8.0


Creating a DataFrame by passing a NumPy array with a datetime index using date_range() and labeled columns.

pd.date_range(start, periods) creates a sequence of dates.

"20130101" is interpreted as January 1, 2013. Pandas uses a flexible parser here.

periods=6 means “generate 6 consecutive dates, one day apart.”

Each entry is a Timestamp object representing that calendar day.

Why it’s here: Later, we use dates as the row index for our DataFrame—this shows how to create a date‐based index rather than the default integer index.

In [6]:
dates = pd.date_range("20130101", periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

`np.random.randn(6, 4)` Creates a 2D NumPy array of shape (6, 4) whose entries are drawn from a standard normal distribution (mean = 0, standard deviation = 1).

In other words, six rows and four columns of random floating-point numbers.

`index=dates`

Tells Pandas to label the six rows of the DataFrame with the six timestamps from dates (2013-01-01 through 2013-01-06).

Without specifying index, Pandas would default to 0, 1, 2, 3, 4, 5 as row labels. By giving index=dates, each row gets a date as its label.

`columns=list("ABCD")`

The expression list("ABCD") produces ['A', 'B', 'C', 'D'].

That becomes the column names for the four columns in the DataFrame. Instead of default names 0, 1, 2, 3, each column is labeled “A,” “B,” “C,” and “D.”

Putting it all together:

We ask Pandas to wrap that random 6×4 NumPy array into a labeled table: six rows keyed by dates, and four columns named A–D.

Resulting df:
In an interactive display, you’d see something like (random numbers will vary):

In [17]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2013-01-01,-1.55285,-1.292249,0.272861,1.493503
2013-01-02,-2.291571,-0.27232,0.348841,-0.737952
2013-01-03,0.045855,-0.167653,-0.008555,0.646295
2013-01-04,-0.149043,0.548698,0.940065,-0.009265
2013-01-05,-0.721696,-0.058577,-0.009584,-0.130624
2013-01-06,-0.714172,0.712125,-0.302254,0.058291


In [15]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)

df2
df2.dtypes

Unnamed: 0,0
A,float64
B,datetime64[s]
C,float32
D,int32
E,category
F,object


In [10]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,-0.371618,-1.590625,-0.818411,-0.065397
2013-01-02,-1.004928,1.207143,-0.512048,-1.154686
2013-01-03,0.325276,0.981668,0.232547,0.819821
2013-01-04,-0.687242,-1.234332,1.175667,1.55136
2013-01-05,-0.028582,-0.367747,1.858197,1.652688


In [12]:
df.tail(3)

Unnamed: 0,A,B,C,D
2013-01-04,-0.687242,-1.234332,1.175667,1.55136
2013-01-05,-0.028582,-0.367747,1.858197,1.652688
2013-01-06,0.444212,-1.082082,-0.134265,-0.847579


In [13]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [19]:
df.to_numpy()

array([[-1.55284972, -1.29224931,  0.27286064,  1.49350279],
       [-2.29157109, -0.27232005,  0.34884078, -0.73795161],
       [ 0.04585549, -0.16765328, -0.00855532,  0.64629527],
       [-0.14904317,  0.5486983 ,  0.94006525, -0.00926466],
       [-0.7216958 , -0.05857686, -0.00958449, -0.13062381],
       [-0.71417236,  0.7121246 , -0.3022536 ,  0.05829116]])

In [20]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.897246,-0.088329,0.206896,0.220042
std,0.881764,0.712318,0.427681,0.764573
min,-2.291571,-1.292249,-0.302254,-0.737952
25%,-1.345061,-0.246153,-0.009327,-0.100284
50%,-0.717934,-0.113115,0.132153,0.024513
75%,-0.290325,0.39688,0.329846,0.499294
max,0.045855,0.712125,0.940065,1.493503


In [21]:
df.T()

TypeError: 'DataFrame' object is not callable