### Preliminaries

**Start importing these python modules**

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

**Check which version of pandas i'm using**

In [4]:
print(pd.__version__)

2.2.2


### The Conceptual Model

Pandas provides 2 datatypes:The DataFrame and The Series

DataFrame is a two-dimensional table of data with column and row indexes(something like a spread sheet.)The cols are made up of series objects

**Series**

A series is an ordered ,one-dimensional array of data with an index.All the data is of the same datatype.Series arithmetic is vectorized after first aligning the series index for each of the operands.

**Examples of series arithmetic**

In [8]:
s1=pd.Series(range(0,4))
s1

0    0
1    1
2    2
3    3
dtype: int64

This creates a sequence of numbers starting from 0 up to (but not including) 4

In [9]:
s2=pd.Series(range(1,5))
s2

0    1
1    2
2    3
3    4
dtype: int64

This creates a sequence of numbers starting from 1 up to (but not including) 5.

In [10]:
s3=s1+s2
s3

0    1
1    3
2    5
3    7
dtype: int64

creates the series which is sum of both series elements

**creating two Pandas Series with the same values [1, 2, 3] but different index orders.**

In [12]:
s4=pd.Series([1,2,3],index=[0,1,2])
s4

0    1
1    2
2    3
dtype: int64

In [13]:
s5=pd.Series([1,2,3],index=[2,1,0])
s5

2    1
1    2
0    3
dtype: int64

| Series | Index Order | Values Assigned |
| ------ | ----------- | --------------- |
| s4     | 0, 1, 2     | 1, 2, 3         |
| s5     | 2, 1, 0     | 1, 2, 3         |


In [14]:
s6=s4+s5
s6

0    4
1    4
2    4
dtype: int64

🧮 How Addition Works in Pandas Series:
Pandas adds elements by matching index labels, not by position.

| Index | s4 Value | s5 Value | Sum |
| ----- | -------- | -------- | --- |
| 0     | 1        | 3        | 4   |
| 1     | 2        | 2        | 4   |
| 2     | 3        | 1        | 4   |


In [15]:
s7=pd.Series([1,2,3],index=[1,2,3])
s7

1    1
2    2
3    3
dtype: int64

In [16]:
s8=pd.Series([1,2,3],index=[0,1,2])
s8

0    1
1    2
2    3
dtype: int64

🧮 Addition: s9 = s7 + s8
Addition happens by matching index labels. Let’s align the indices:

| Index | s7 Value | s8 Value | s7 + s8 |
| ----- | -------- | -------- | ------- |
| 0     | —        | 1        | NaN     |
| 1     | 1        | 2        | 3       |
| 2     | 2        | 3        | 5       |
| 3     | 3        | —        | NaN     |


In [19]:
s9=s7+s8
s9

0    NaN
1    3.0
2    5.0
3    NaN
dtype: float64

### Get your data into DataFrame

**Instantiate the DataFrame**

In [24]:
df1=pd.DataFrame() #Empty dataframe
df1


In [25]:
# Define the dictionary first
python_dictionary = {
    'Name': ['Devika', 'Ashika', 'Kavya'],
    'Age': [22, 23, 21],
    'Branch': ['CSE', 'ECE', 'ISE']
}
# Create DataFrame from python_dictionary
df2=pd.DataFrame(python_dictionary)
df2

Unnamed: 0,Name,Age,Branch
0,Devika,22,CSE
1,Ashika,23,ECE
2,Kavya,21,ISE


In [26]:

# Define the NumPy array (matrix)
numpy_matrix = np.array([[10, 20], [30, 40], [50, 60]])

# Create DataFrame from NumPy matrix
df3=pd.DataFrame(numpy_matrix)
df3

Unnamed: 0,0,1
0,10,20
1,30,40
2,50,60


**Load a dataframe from a csv file**

In [29]:
df_csv = pd.read_csv('Books.csv', header=0, index_col=0, na_values=['na', '-', '.', ''])
df_csv

Unnamed: 0_level_0,author,pages,genre,description,published_date,publisher,language,average_rating,ratings_count,thumbnail
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Fictional Points of View,Peter Lamarque,252,Literary Criticism,The volume focuses on a wide range of thinkers...,1996,Cornell University Press,en,No rating,0,http://books.google.com/books/content?id=rh-om...
Science Fiction and Fantasy Literature,"R. Reginald, Douglas Menville, Mary A. Burgess",802,Reference,"Science Fiction and Fantasy Literature, A Chec...",2010-09-01,Wildside Press LLC,en,No rating,0,http://books.google.com/books/content?id=P8zW2...
Library of Congress Subject Headings,Library of Congress. Cataloging Policy and Sup...,1662,"Subject headings, Library of Congress",No description available,2004,Unknown Publisher,en,No rating,0,http://books.google.com/books/content?id=pEhkh...
Library of Congress Subject Headings,Library of Congress,1512,"Subject headings, Library of Congress",No description available,2007,Unknown Publisher,en,No rating,0,http://books.google.com/books/content?id=FgAjF...
Fictional Space in the Modernist and Post-modernist American Novel,Carl Darryl Malmgren,248,Fiction,Fictional space is the imaginal expanse of fie...,1985,Bucknell University Press,en,No rating,0,http://books.google.com/books/content?id=KXzoz...
...,...,...,...,...,...,...,...,...,...,...
The Index Card,"Helaine Olen, Harold Pollack",256,Personal Finance,Simplifies personal finance to ten rules that ...,2016-01-05,Portfolio,en,4.0,30000,http://books.google.com/books/content?id=8z4_D...
The Road to Wealth,Suze Orman,608,Personal Finance,"A comprehensive guide to managing money, inves...",2001-04-01,Riverhead Books,en,4.1,50000,http://books.google.com/books/content?id=zv0oD...
The Success Principles,Jack Canfield,512,Self-Help,A guide to achieving personal and financial su...,2004-12-28,HarperCollins,en,4.2,100000,http://books.google.com/books/content?id=7zL_D...
The Courage to Be Rich,Suze Orman,448,Personal Finance,Combines emotional and practical advice for bu...,1999-03-01,Riverhead Books,en,4.0,40000,http://books.google.com/books/content?id=2c3_D...


| Column Name      | Meaning                                               |
| ---------------- | ----------------------------------------------------- |
| `author`         | Author(s) of the book                                 |
| `pages`          | Total number of pages                                 |
| `genre`          | Category like "Fiction", "Finance", "Reference", etc. |
| `description`    | Summary or blurb about the book                       |
| `published_date` | Year or date of publication                           |
| `publisher`      | Publishing company                                    |
| `language`       | Language (e.g., `"en"` for English)                   |
| `average_rating` | User rating (can be number or text like "No rating")  |
| `ratings_count`  | Number of users who rated it                          |
| `thumbnail`      | URL to the book's cover image thumbnail               |


| Argument          | What it does                                    |
| ----------------- | ----------------------------------------------- |
| `header=0`        | Uses the first row as column names              |
| `index_col=0`     | Uses the first column as the row index          |
| `na_values=[...]` | Converts listed symbols to missing values (NaN) |



In [30]:
df_csv.shape

(2049, 10)

2049 rows (books)

10 columns (data fields)

**Get your data from inline python csv text**

In [34]:
from io import StringIO
data=""",Animal,Cuteness,Desirable
A,       dog,     8.7,     True
B,       cat,     9.5,     False"""

df_inlinedata=pd.read_csv(StringIO(data),header=0,index_col=0,skipinitialspace=True)
df_inlinedata

Unnamed: 0,Animal,Cuteness,Desirable
A,dog,8.7,True
B,cat,9.5,False


| Parameter               | Meaning                                                            |
| ----------------------- | ------------------------------------------------------------------ |
| `StringIO(data)`        | Treats the `data` string like a CSV file.                          |
| `header=0`              | Uses the **first line** as column names.                           |
| `index_col=0`           | Uses the **first column** (which contains A and B) as the index.   |
| `skipinitialspace=True` | Strips any extra spaces **after commas** (e.g., `' dog' → 'dog'`). |


In [36]:
df_inlinedata.loc['A', 'Animal']     # Output: 'dog'



'dog'

In [37]:
df_inlinedata['Cuteness'].mean()     # Output: 9.1

9.1

**Also among many other options are there**


**1.Reading html string**

In [40]:

#Reading from html string
from io import StringIO
import pandas as pd

html_string = """
<table>
    <tr><th>Name</th><th>Age</th></tr>
    <tr><td>Devika</td><td>22</td></tr>
    <tr><td>Ashika</td><td>21</td></tr>
</table>
"""

df_html = pd.read_html(html_string)
df_htmlelement=df_html[0]
print(df_htmlelement)


     Name  Age
0  Devika   22
1  Ashika   21


  df_html = pd.read_html(html_string)


**2.Reading html using url**


In [46]:
#Reading from html string
from io import StringIO
import pandas as pd

html_url= "https://www.kaggle.com/datasets"
df_html = pd.read_html(html_url)   # This gives you a list of DataFrames
df= df_html[0]               # Access the first table
print(df.head())

ImportError: Missing optional dependency 'html5lib'.  Use pip or conda to install html5lib.

In [47]:
pip insta;; html5lib

Note: you may need to restart the kernel to use updated packages.


ERROR: unknown command "insta;;" - maybe you meant "install"

