# Primary Pandas Data Structures

### . Series - One Dimensional Data
![image.png](attachment:image.png)

### . Data Frames - Group os Series
![image-2.png](attachment:image-2.png)

### Importing Pandas

In [1]:
import pandas as pd

### Creating a Series

In [2]:
s = pd.Series([10, 20, 30])
s

0    10
1    20
2    30
dtype: int64

The values in the index are actually labels to the data, to help you retrieve it later.

You can create more appropriate labels.

In [3]:
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
s

a    10
b    20
c    30
d    40
dtype: int64

In [4]:
s.values

array([10, 20, 30, 40], dtype=int64)

In [5]:
s.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [6]:
s[0]

10

In [7]:
s[1]

20

In [8]:
s['c']

30

In [9]:
s['d']

40

In [10]:
s[0:2]

a    10
b    20
dtype: int64

In [11]:
s[['b', 'c', 'd']]

b    20
c    30
d    40
dtype: int64

In [12]:
s[0] = 100
s

a    100
b     20
c     30
d     40
dtype: int64

In [13]:
s['b'] = 200
s

a    100
b    200
c     30
d     40
dtype: int64

In [14]:
s[['c', 'd']] = (300, 400)
s

a    100
b    200
c    300
d    400
dtype: int64

In [15]:
import numpy as np

In [49]:
array = np.random.random(10)
array

array([0.69502359, 0.85282892, 0.1428039 , 0.40618306, 0.66856504,
       0.83468825, 0.25256577, 0.99122357, 0.58398666, 0.66137549])

In [50]:
s = pd.Series(array)
s

0    0.695024
1    0.852829
2    0.142804
3    0.406183
4    0.668565
5    0.834688
6    0.252566
7    0.991224
8    0.583987
9    0.661375
dtype: float64

In [51]:
s[s>0.5]

0    0.695024
1    0.852829
4    0.668565
5    0.834688
7    0.991224
8    0.583987
9    0.661375
dtype: float64

In [52]:
s/2

0    0.347512
1    0.426414
2    0.071402
3    0.203092
4    0.334283
5    0.417344
6    0.126283
7    0.495612
8    0.291993
9    0.330688
dtype: float64

In [53]:
np.log(s)

0   -0.363809
1   -0.159196
2   -1.946283
3   -0.900951
4   -0.402622
5   -0.180697
6   -1.376084
7   -0.008815
8   -0.537877
9   -0.413434
dtype: float64

In [54]:
s = s-0.5
s

0    0.195024
1    0.352829
2   -0.357196
3   -0.093817
4    0.168565
5    0.334688
6   -0.247434
7    0.491224
8    0.083987
9    0.161375
dtype: float64

In [55]:
s = np.log(s)
s

  result = getattr(ufunc, method)(*inputs, **kwargs)


0   -1.634635
1   -1.041772
2         NaN
3         NaN
4   -1.780434
5   -1.094556
6         NaN
7   -0.710856
8   -2.477097
9   -1.824021
dtype: float64

In [56]:
s.isnull()

0    False
1    False
2     True
3     True
4    False
5    False
6     True
7    False
8    False
9    False
dtype: bool

In [57]:
s.notnull()

0     True
1     True
2    False
3    False
4     True
5     True
6    False
7     True
8     True
9     True
dtype: bool

In [31]:
ser = pd.Series([10, 20, 10, 30, 30, 20, 40, 10, 40, 50, 50, 10, 30, 10])

In [33]:
ser.unique()

array([10, 20, 30, 40, 50], dtype=int64)

In [34]:
ser.value_counts()

10    5
30    3
50    2
20    2
40    2
dtype: int64

In [35]:
ser.isin([40, 50])

0     False
1     False
2     False
3     False
4     False
5     False
6      True
7     False
8      True
9      True
10     True
11    False
12    False
13    False
dtype: bool

In [36]:
ser[ser.isin([40, 50])]

6     40
8     40
9     50
10    50
dtype: int64

### Creating a dataframe

In [19]:
df = pd.DataFrame([[0, 1, 2],[3, 4, 5], [6, 7, 8]])
df

Unnamed: 0,0,1,2
0,0,1,2
1,3,4,5
2,6,7,8


In [20]:
df = pd.read_csv('sample_data.csv', sep=',')
df

Unnamed: 0,line_id,max_pressure,min_temperature,max_temperature,nominal_diameter,minimum_thickness,fluid
0,1,20,0,250,14,8,P
1,2,10,-10,200,10,8,PC
2,3,15,-29,150,12,8,F
3,4,2,0,50,4,8,FG
4,5,4,0,50,2,8,CN


Shows some atributes of the dataframe

In [21]:
df.shape

(5, 7)

In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   line_id             5 non-null      int64 
 1    max_pressure       5 non-null      int64 
 2    min_temperature    5 non-null      int64 
 3    max_temperature    5 non-null      int64 
 4    nominal_diameter   5 non-null      int64 
 5    minimum_thickness  5 non-null      int64 
 6    fluid              5 non-null      object
dtypes: int64(6), object(1)
memory usage: 408.0+ bytes


In [23]:
pd.set_option('display.max_columns', 4)

In [24]:
df

Unnamed: 0,line_id,max_pressure,...,minimum_thickness,fluid
0,1,20,...,8,P
1,2,10,...,8,PC
2,3,15,...,8,F
3,4,2,...,8,FG
4,5,4,...,8,CN


In [25]:
df.head(3)

Unnamed: 0,line_id,max_pressure,...,minimum_thickness,fluid
0,1,20,...,8,P
1,2,10,...,8,PC
2,3,15,...,8,F


In [26]:
df.head(10)

Unnamed: 0,line_id,max_pressure,...,minimum_thickness,fluid
0,1,20,...,8,P
1,2,10,...,8,PC
2,3,15,...,8,F
3,4,2,...,8,FG
4,5,4,...,8,CN


In [27]:
df.tail(2)

Unnamed: 0,line_id,max_pressure,...,minimum_thickness,fluid
3,4,2,...,8,FG
4,5,4,...,8,CN


In [28]:
df.tail(20)

Unnamed: 0,line_id,max_pressure,...,minimum_thickness,fluid
0,1,20,...,8,P
1,2,10,...,8,PC
2,3,15,...,8,F
3,4,2,...,8,FG
4,5,4,...,8,CN
