# Introduction to Pandas Data Structure

In [1]:
# importing libraries :
import pandas as pd
import numpy as np

Fundamentally, d```ata alignment is intrinsic``` . The link between labels and data will not be broken unless done so explicitly by us.

**Pandas DS :**
* Series
* DataFrame

## Series

```Series``` is a 1-D array in Pandas, dtype--can be anything like int,float,str,object,...

In [2]:
# s = pd.Series(data, index=index)

data : 
* dictionary
* scalar -- value wise
* np 1-D array

index : 
* listof axis labels
* may be : date_time, list of numbers, already defined by dict key ( dict value is in data), ...

In [3]:
d = {"a": 0.0, "b": 1.0, "c": 2.0}    # from dict

pd.Series(d)

a    0.0
b    1.0
c    2.0
dtype: float64

In [5]:
series = pd.Series(np.random.randn(6), index=["a", "b", "c", "d", "e", "f"])  # from list

series

a   -0.414610
b    2.525017
c   -1.387076
d   -2.410352
e    0.898796
f   -2.796450
dtype: float64

In [6]:
pd.Series(1.0, index=["m", "n", "p", "q", "y"])    # creating from a "scalar" value

m    1.0
n    1.0
p    1.0
q    1.0
y    1.0
dtype: float64

In [7]:
series.median()     # finding median

-0.9008427637263542

In [8]:
series.mean()     # finding mean

-0.5974456570849481

In [9]:
series.describe()     # statistical summary

count    6.000000
mean    -0.597446
std      2.038024
min     -2.796450
25%     -2.154533
50%     -0.900843
75%      0.570445
max      2.525017
dtype: float64

In [10]:
np.exp(series)    # find exponential

a     0.660598
b    12.491108
c     0.249805
d     0.089784
e     2.456643
f     0.061026
dtype: float64

In [11]:
series[4]    # Accessing specific value

0.8987959789288953

In [12]:
series[2:4]   # slicing the rows

c   -1.387076
d   -2.410352
dtype: float64

In [14]:
series[[2,4]]  # accessing multiple rows/indexes

c   -1.387076
e    0.898796
dtype: float64

In [17]:
# checking datatype :

series.dtype

dtype('float64')

In [20]:
series.to_numpy()    # converting into NumPy array

array([-0.41460959,  2.52501706, -1.38707594, -2.4103519 ,  0.89879598,
       -2.79644956])

In [22]:
series.array  # Converting into Pandas Array

<PandasArray>
[-0.41460958700073847,    2.525017061557021,    -1.38707594045197,
   -2.410351897002969,   0.8987959789288953,  -2.7964495585399276]
Length: 6, dtype: float64

In [23]:
series.array[0]

-0.41460958700073847

In [24]:
type(series.array)

pandas.core.arrays.numpy_.PandasArray

In [25]:
type(series.to_numpy())

numpy.ndarray

```Series``` also behaves like a ```dict``` .

In [27]:
series["a"], series["b"]      # missing key/index will return error --> use .get() method to avoid error.

(-0.41460958700073847, 2.525017061557021)

In [28]:
"e" in series, "f" in series, "g" in series

(True, True, False)

In [29]:
# missing key/index will return error --> use .get() method to avoid error.

series.get("h")   # key not available -- return nothing --- we can set default value too -- if not present then output a specifoc value.

In [33]:
series.get("b"), series.get("h", "not_available"), series.get("j", np.nan)

(2.525017061557021, 'not_available', nan)

In [34]:
series.a

-0.41460958700073847

## Vectorized operations and label alignment with Series

```looping through value-by-value is usually not necessary.```

In [35]:
series+series

a   -0.829219
b    5.050034
c   -2.774152
d   -4.820704
e    1.797592
f   -5.592899
dtype: float64

In [36]:
series*2

a   -0.829219
b    5.050034
c   -2.774152
d   -4.820704
e    1.797592
f   -5.592899
dtype: float64

In [39]:
s[1:]

b    0.761259
c    1.560026
d    1.038568
e   -0.283605
dtype: float64

In [40]:
s[:-1]

a   -0.203787
b    0.761259
c    1.560026
d    1.038568
dtype: float64

In [41]:
s[1:] +s[ :-1]

a         NaN
b    1.522518
c    3.120052
d    2.077135
e         NaN
dtype: float64

```The result of an operation between unaligned Series will have the union of the indexes involved. ```

```If a label is not found in one Series or the other, the result will be marked as missing NaN.```

## Name Attribute :

```Series also has a name attribute:```

In [46]:
series = pd.Series(np.random.randn(6), name="my_series")

In [44]:
series

0    0.152772
1   -1.242601
2   -0.584918
3   -0.463636
4    0.430179
5    0.842802
Name: my_series, dtype: float64

In [49]:
series.rename("your_series", inplace=True) # either use 'inplace=True' or save it in another variable.

0    0.141019
1    0.403357
2    0.402901
3   -0.293804
4    0.633683
5    0.480072
Name: your_series, dtype: float64

In [50]:
series

0    0.141019
1    0.403357
2    0.402901
3   -0.293804
4    0.633683
5    0.480072
Name: your_series, dtype: float64

## DataFrame

pd.DataFrame(...)

NEXTNOTEBOOK