### Pandas DataFrame

Start importing python modules

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt    #For Chart 

Check which version of pandas we are using

In [2]:
print(pd.__version__)

2.2.2


Examples of series arithmetic

In [3]:
data={'vegetables':['beetroot','carrot','tomato','kovakkai','coconut'],
      'color':['purple','orange','red','green','white'],
      'unit':['1kg','1kg','1kg','1kg','1piece'],
      'min':[15,25,53,30,'seventeen'],
      'max':[40,45,90,45,38],
     }

In [6]:
market=pd.DataFrame(data)

In [7]:
market

Unnamed: 0,vegetables,color,unit,min,max
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
2,tomato,red,1kg,53,90
3,kovakkai,green,1kg,30,45
4,coconut,white,1piece,seventeen,38


**Saving a dataframe to python object**

In [9]:
m=market.to_dict()   #Saving dataframe as dictionary

In [11]:
m   #Printing the dictionary

{'vegetables': {0: 'beetroot',
  1: 'carrot',
  2: 'tomato',
  3: 'kovakkai',
  4: 'coconut'},
 'color': {0: 'purple', 1: 'orange', 2: 'red', 3: 'green', 4: 'white'},
 'unit': {0: '1kg', 1: '1kg', 2: '1kg', 3: '1kg', 4: '1piece'},
 'min': {0: 15, 1: 25, 2: 53, 3: 30, 4: 'seventeen'},
 'max': {0: 40, 1: 45, 2: 90, 3: 45, 4: 38}}

Converting into numpy matrix

In [14]:
matrix=market.values   #To numpy matrix

In [15]:
matrix

array([['beetroot', 'purple', '1kg', 15, 40],
       ['carrot', 'orange', '1kg', 25, 45],
       ['tomato', 'red', '1kg', 53, 90],
       ['kovakkai', 'green', '1kg', 30, 45],
       ['coconut', 'white', '1piece', 'seventeen', 38]], dtype=object)

### Working with whole dataframe

**i.Peek at the dataframe contents and structure**

In [18]:
market.info()   #print columns and datatypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   vegetables  5 non-null      object
 1   color       5 non-null      object
 2   unit        5 non-null      object
 3   min         5 non-null      object
 4   max         5 non-null      int64 
dtypes: int64(1), object(4)
memory usage: 332.0+ bytes


In [24]:
dfh=market.head(3)   #To get first i rows here i is index position till there it will print..without that position
dfh

Unnamed: 0,vegetables,color,unit,min,max
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
2,tomato,red,1kg,53,90


In [23]:
dft=market.tail(2)   #To get last i rows here i is index position from there it will print
dft

Unnamed: 0,vegetables,color,unit,min,max
3,kovakkai,green,1kg,30,45
4,coconut,white,1piece,seventeen,38


In [25]:
dfs=market.describe()   #Summary stats for cols
dfs

Unnamed: 0,max
count,5.0
mean,51.6
std,21.686401
min,38.0
25%,40.0
50%,45.0
75%,45.0
max,90.0


***Summary stats for cols***

| Statistic | Meaning                             |
| --------- | ----------------------------------- |
| `count`   | Number of non-null values           |
| `mean`    | Average value                       |
| `std`     | Standard deviation (spread of data) |
| `min`     | Minimum value                       |
| `25%`     | 25th percentile (Q1)                |
| `50%`     | Median (Q2)                         |
| `75%`     | 75th percentile (Q3)                |
| `max`     | Maximum value                       |


In [29]:
top_left_corner_df=market.iloc[:4,:4]  #To print objects up to 4th index and columns up to 4th index excluding that one
top_left_corner_df

Unnamed: 0,vegetables,color,unit,min
0,beetroot,purple,1kg,15
1,carrot,orange,1kg,25
2,tomato,red,1kg,53
3,kovakkai,green,1kg,30


***ii.Dataframe non-indexing attributes***

In [32]:
transpose=market.T   #It makes rows as cols... and cols as rows i.e transpose...
transpose

Unnamed: 0,0,1,2,3,4
vegetables,beetroot,carrot,tomato,kovakkai,coconut
color,purple,orange,red,green,white
unit,1kg,1kg,1kg,1kg,1piece
min,15,25,53,30,seventeen
max,40,45,90,45,38


In [37]:
l=market.axes   #List of row & columns
l


[RangeIndex(start=0, stop=5, step=1),
 Index(['vegetables', 'color', 'unit', 'min', 'max'], dtype='object')]

In [39]:
l[0]  # Row index (0 to 5)

RangeIndex(start=0, stop=5, step=1)

In [40]:
l[1]  # Column names


Index(['vegetables', 'color', 'unit', 'min', 'max'], dtype='object')

In [34]:
(ri,ci)=market.axes      #To get rows and cols of previously created list
(ri,ci)

(RangeIndex(start=0, stop=5, step=1),
 Index(['vegetables', 'color', 'unit', 'min', 'max'], dtype='object'))

list(ci) → gives a list of column names

len(ri) → gives the number of rows

'min' in ci → checks if column 'min' exists

In [42]:
list(ci)    #list(ci) → gives a list of column names

['vegetables', 'color', 'unit', 'min', 'max']

In [43]:
len(ri)     #len(ri) → gives the number of rows

5

In [44]:
min(ci)   #'min' in ci → checks if column 'min' exists

'color'

In [46]:
market

Unnamed: 0,vegetables,color,unit,min,max
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
2,tomato,red,1kg,53,90
3,kovakkai,green,1kg,30,45
4,coconut,white,1piece,seventeen,38


**dtypes**

To return a Series of data types for each column in the DataFrame market.
It returns the data type (dtype) of each column.

In [45]:
s=market.dtypes   #Series column datatypes
s

vegetables    object
color         object
unit          object
min           object
max            int64
dtype: object

***Data Types of Columns***

Because of string 'seventeen' min is showing as object
| Column      | Data Type |
|-------------|------------|
| vegetables  | object     |
| color       | object     |
| unit        | object     |
| min         | object     |
| max         | int64      |


**empty**

For checking dataframe is empty or not...If it is empty it gives true else false

In [53]:
b=market.empty   #for checking dataframe is empty or not...If it is empty it gives true else false
b

False

**ndim**

ndim stands for number of dimensions.

In pandas:

A DataFrame is always 2-dimensional → rows and columns.

A Series (single column or list-like data) is 1-dimensional.

In [54]:
i=market.ndim   #Number of axes(It is 2 here)
i

2

**shape**
It returns a tuple:(rows, columns)

In [56]:
t=market.shape    #(row-count,column-count)
t

(5, 5)

In [57]:
market

Unnamed: 0,vegetables,color,unit,min,max
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
2,tomato,red,1kg,53,90
3,kovakkai,green,1kg,30,45
4,coconut,white,1piece,seventeen,38


**size**

To return the **total number of individual elements (cells)** in the DataFrame `market`.

* `size` = total number of **data elements** in the DataFrame.
* It is calculated as:

market.size = number of rows × number of columns
 DataFrame has:

* `5` rows
* `5` columns

market.size = 5 × 5 = 25

So `i` will be:
25


In [59]:
i=market.size    #Returns  total number of elements
i

25

**values**

converting the entire DataFrame market into a NumPy 2D array (matrix), where each row is a record and each column is a feature.

It gives you the underlying NumPy array of the DataFrame.
    
The result is a 2D ndarray (rows × columns).

All data is converted to a common type (often object if mixed types are present).


In [61]:
a=market.values       #Get a numpy matrix for market dataframe
a

array([['beetroot', 'purple', '1kg', 15, 40],
       ['carrot', 'orange', '1kg', 25, 45],
       ['tomato', 'red', '1kg', 53, 90],
       ['kovakkai', 'green', '1kg', 30, 45],
       ['coconut', 'white', '1piece', 'seventeen', 38]], dtype=object)

**iii.Dataframe Utility methods**

**copy**

creating an independent copy of the DataFrame market and storing it in a new variable df.


In [66]:
df=market.copy()   #To copy the market dataframe and store in df 
df

Unnamed: 0,vegetables,color,unit,min,max
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
2,tomato,red,1kg,53,90
3,kovakkai,green,1kg,30,45
4,coconut,white,1piece,seventeen,38


**sort_values**

sorting the DataFrame df in ascending order based on the values in the 'max' column.

syntax: sort_values(by=columnname)

In [70]:
df=df.sort_values(by='max')
df

Unnamed: 0,vegetables,color,unit,min,max
4,coconut,white,1piece,seventeen,38
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
3,kovakkai,green,1kg,30,45
2,tomato,red,1kg,53,90


In [79]:
df=df.sort_values(by=['max','min'])
df

Unnamed: 0,vegetables,color,unit,min,max
4,coconut,white,1piece,seventeen,38
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
3,kovakkai,green,1kg,30,45
2,tomato,red,1kg,53,90


In [80]:
df  #Origininal dataframe is affected with sorting so i created that dataframe again

Unnamed: 0,vegetables,color,unit,min,max
4,coconut,white,1piece,seventeen,38
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
3,kovakkai,green,1kg,30,45
2,tomato,red,1kg,53,90


In [81]:
data={'vegetables':['beetroot','carrot','tomato','kovakkai','coconut'],
      'color':['purple','orange','red','green','white'],
      'unit':['1kg','1kg','1kg','1kg','1piece'],
      'min':[15,25,53,30,'seventeen'],
      'max':[40,45,90,45,38],
     }
df=pd.DataFrame(data)
df

Unnamed: 0,vegetables,color,unit,min,max
0,beetroot,purple,1kg,15,40
1,carrot,orange,1kg,25,45
2,tomato,red,1kg,53,90
3,kovakkai,green,1kg,30,45
4,coconut,white,1piece,seventeen,38
