# Pandas

## Introduction

* Pandas is a Python library used for working with data sets.
* It has functions for analyzing, cleaning, exploring, and manipulating data.
* "Pandas" was created by Wes McKinney in 2008.
* It refers both "Panel Data", and "Python Data Analysis".


## Installation
```python 
pip install python
```

## Import
```python
import pandas
```

## Dataframe
In pandas, DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

```python 
pandas.DataFrame(data, index, dtype, copy)
```

#### Examples

In [1]:
## Example 1

import pandas

mydataset = {
  'Country': ["India", "USA", "Japan", 'Switzerland'],
  'CountryCode': [91, 1, 81, 268],
  'Currency': ['Rupee', 'Doller', 'Yen', 'Franc']
}

myvar = pandas.DataFrame(mydataset)

print(myvar)


       Country  CountryCode Currency
0        India           91    Rupee
1          USA            1   Doller
2        Japan           81      Yen
3  Switzerland          268    Franc


In [2]:
## Examples 1'

import pandas as pd

mydataset = {
  'Country': ["India", "USA", "Japan", 'Switzerland'],
  'Code': [91, 1, 81, 268],
  'Currency': ['Rupee', 'Doller', 'Yen', 'Franc']
}

myvar = pd.DataFrame(mydataset)

print(myvar)

       Country  Code Currency
0        India    91    Rupee
1          USA     1   Doller
2        Japan    81      Yen
3  Switzerland   268    Franc


#### Example 2

In [3]:
import pandas as pd

In [29]:
df = pd.DataFrame([['India', 91, 'Rupee'],['USA', 1, 'Doller'],['Japan', 81, 'Yen'], ['Switzerland', 268, 'Franc']], 
                    columns=['Country', 'Code', 'Currency'], 
                    index=["a","b","c","d"])

In [30]:
print(df)

       Country  Code Currency
a        India    91    Rupee
b          USA     1   Doller
c        Japan    81      Yen
d  Switzerland   268    Franc


In [31]:
df.head()

Unnamed: 0,Country,Code,Currency
a,India,91,Rupee
b,USA,1,Doller
c,Japan,81,Yen
d,Switzerland,268,Franc


In [32]:
df.tail()

Unnamed: 0,Country,Code,Currency
a,India,91,Rupee
b,USA,1,Doller
c,Japan,81,Yen
d,Switzerland,268,Franc


In [33]:
df.columns

Index(['Country', 'Code', 'Currency'], dtype='object')

In [34]:
df.index.tolist()

['a', 'b', 'c', 'd']

In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, a to d
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Country   4 non-null      object
 1   Code      4 non-null      int64 
 2   Currency  4 non-null      object
dtypes: int64(1), object(2)
memory usage: 128.0+ bytes


#### df.describe() 
df.describe() method in the Pandas library in Python is used to generate descriptive statistics of a DataFrame's columns.

In [38]:
import pandas as pd
import numpy as np

# Create a funny student dataset
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry'],
    'age': [20, 22, 21, 23, 20, 24, 19, 22],
    'height_cm': [165, 180, 175, 168, 160, 185, 162, 178],
    'weight_kg': [55, 75, 70, 58, 52, 80, 54, 76],
    'exam_score': [85, 62, 78, 92, 88, 55, 95, 65],
    'sleep_hours': [7, 5, 6, 8, 7, 4, 9, 6],
    'coffee_cups': [2, 4, 3, 1, 2, 5, 0, 3]
}

df = pd.DataFrame(data)
print("Student Dataset")
print(df)

Student Dataset
      name  age  height_cm  weight_kg  exam_score  sleep_hours  coffee_cups
0    Alice   20        165         55          85            7            2
1      Bob   22        180         75          62            5            4
2  Charlie   21        175         70          78            6            3
3    Diana   23        168         58          92            8            1
4      Eve   20        160         52          88            7            2
5    Frank   24        185         80          55            4            5
6    Grace   19        162         54          95            9            0
7    Henry   22        178         76          65            6            3


In [40]:
print("Basic describe() - All numerical columns:")
print(df.describe())

Basic describe() - All numerical columns:
             age   height_cm  weight_kg  exam_score  sleep_hours  coffee_cups
count   8.000000    8.000000   8.000000     8.00000     8.000000     8.000000
mean   21.375000  171.625000  65.000000    77.50000     6.500000     2.500000
std     1.685018    9.148575  11.401754    15.05229     1.603567     1.603567
min    19.000000  160.000000  52.000000    55.00000     4.000000     0.000000
25%    20.000000  164.250000  54.750000    64.25000     5.750000     1.750000
50%    21.500000  171.500000  64.000000    81.50000     6.500000     2.500000
75%    22.250000  178.500000  75.250000    89.00000     7.250000     3.250000
max    24.000000  185.000000  80.000000    95.00000     9.000000     5.000000


In [42]:
print("🎯 describe() on exam scores:")
print(df['exam_score'].describe())

🎯 describe() on exam scores:
count     8.00000
mean     77.50000
std      15.05229
min      55.00000
25%      64.25000
50%      81.50000
75%      89.00000
max      95.00000
Name: exam_score, dtype: float64


In [43]:
print("Sleep and Coffee Stats:")
print(df[['sleep_hours', 'coffee_cups']].describe())

Sleep and Coffee Stats:
       sleep_hours  coffee_cups
count     8.000000     8.000000
mean      6.500000     2.500000
std       1.603567     1.603567
min       4.000000     0.000000
25%       5.750000     1.750000
50%       6.500000     2.500000
75%       7.250000     3.250000
max       9.000000     5.000000


In [44]:
df.shape

(8, 7)

In [45]:
df.size

56

In [46]:
!pwd

/Users/abdulquadir/Desktop/Python-Computation-Workshop


### Creating a DataFrame

In [6]:
# Converting a list into a DataFrame
list1 = [10,20,30,40]
data = pd.DataFrame(list1)
print(data)

    0
0  10
1  20
2  30
3  40


### DataFrame - Addition & Deletion of Columns

In [64]:
import pandas as pd

# Step 1: Create a dictionary of authors and their books
authors_books = {
    "Fyodor Dostoevsky": [
        "Crime and Punishment",
        "The Brothers Karamazov",
        "The Idiot",
        "Notes from Underground"
    ],
    "Leo Tolstoy": [
        "War and Peace",
        "Anna Karenina",
        "The Death of Ivan Ilyich"
    ],
    "Franz Kafka": [
        "The Metamorphosis",
        "The Trial",
        "The Castle"
    ],
    "Khalil Gibran": [
        "The Prophet",
        "Sand and Foam",
        "The Madman"
    ],
    "George Orwell": [
        "1984",
        "Animal Farm",
        "Homage to Catalonia"
    ],
    "Anton Chekhov": [
        "The Cherry Orchard",
        "Uncle Vanya",
        "The Seagull"
    ],
    "Nikolai Gogol": [
        "Dead Souls",
        "The Overcoat",
        "The Nose"
    ],
    "Maxim Gorky": [
        "The Mother",
        "My Childhood"
    ],
    "Ivan Turgenev": [
        "Fathers and Sons",
        "A Month in the Country"
    ],
    "Alexander Pushkin": [
        "Eugene Onegin",
        "The Queen of Spades"
    ]
}

# Step 2: Convert the dictionary into a list of records for DataFrame
data = []
for author, books in authors_books.items():
    for book in books:
        data.append({"Author": author, "Book": book})

# Step 3: Create a pandas DataFrame
df = pd.DataFrame(data)

# Step 4: Display the DataFrame
print(df)

# (Optional) Step 5: Save it to a CSV file
df.to_csv("famous_authors_books.csv", index=False)


               Author                      Book
0   Fyodor Dostoevsky      Crime and Punishment
1   Fyodor Dostoevsky    The Brothers Karamazov
2   Fyodor Dostoevsky                 The Idiot
3   Fyodor Dostoevsky    Notes from Underground
4         Leo Tolstoy             War and Peace
5         Leo Tolstoy             Anna Karenina
6         Leo Tolstoy  The Death of Ivan Ilyich
7         Franz Kafka         The Metamorphosis
8         Franz Kafka                 The Trial
9         Franz Kafka                The Castle
10      Khalil Gibran               The Prophet
11      Khalil Gibran             Sand and Foam
12      Khalil Gibran                The Madman
13      George Orwell                      1984
14      George Orwell               Animal Farm
15      George Orwell       Homage to Catalonia
16      Anton Chekhov        The Cherry Orchard
17      Anton Chekhov               Uncle Vanya
18      Anton Chekhov               The Seagull
19      Nikolai Gogol                Dea

In [65]:
# Create a Series for publication years
years = pd.Series([
    1866,  # Crime and Punishment
    1880,  # The Brothers Karamazov
    1869,  # The Idiot
    1864,  # Notes from Underground
    1869,  # War and Peace
    1877,  # Anna Karenina
    1886,  # The Death of Ivan Ilyich
    1915,  # The Metamorphosis
    1925,  # The Trial (posthumous)
    1926   # The Castle (posthumous)
])

df["Year"] = years
print(df)

               Author                      Book    Year
0   Fyodor Dostoevsky      Crime and Punishment  1866.0
1   Fyodor Dostoevsky    The Brothers Karamazov  1880.0
2   Fyodor Dostoevsky                 The Idiot  1869.0
3   Fyodor Dostoevsky    Notes from Underground  1864.0
4         Leo Tolstoy             War and Peace  1869.0
5         Leo Tolstoy             Anna Karenina  1877.0
6         Leo Tolstoy  The Death of Ivan Ilyich  1886.0
7         Franz Kafka         The Metamorphosis  1915.0
8         Franz Kafka                 The Trial  1925.0
9         Franz Kafka                The Castle  1926.0
10      Khalil Gibran               The Prophet     NaN
11      Khalil Gibran             Sand and Foam     NaN
12      Khalil Gibran                The Madman     NaN
13      George Orwell                      1984     NaN
14      George Orwell               Animal Farm     NaN
15      George Orwell       Homage to Catalonia     NaN
16      Anton Chekhov        The Cherry Orchard 

In [66]:
# DataFrame columns can be deleted using the del() function
# Also pop() function
del df["Year"]

In [67]:
print(df)

               Author                      Book
0   Fyodor Dostoevsky      Crime and Punishment
1   Fyodor Dostoevsky    The Brothers Karamazov
2   Fyodor Dostoevsky                 The Idiot
3   Fyodor Dostoevsky    Notes from Underground
4         Leo Tolstoy             War and Peace
5         Leo Tolstoy             Anna Karenina
6         Leo Tolstoy  The Death of Ivan Ilyich
7         Franz Kafka         The Metamorphosis
8         Franz Kafka                 The Trial
9         Franz Kafka                The Castle
10      Khalil Gibran               The Prophet
11      Khalil Gibran             Sand and Foam
12      Khalil Gibran                The Madman
13      George Orwell                      1984
14      George Orwell               Animal Farm
15      George Orwell       Homage to Catalonia
16      Anton Chekhov        The Cherry Orchard
17      Anton Chekhov               Uncle Vanya
18      Anton Chekhov               The Seagull
19      Nikolai Gogol                Dea

In [68]:
## Row selection can be done using loc() or iloc()
print(df.iloc[0:4])

              Author                    Book
0  Fyodor Dostoevsky    Crime and Punishment
1  Fyodor Dostoevsky  The Brothers Karamazov
2  Fyodor Dostoevsky               The Idiot
3  Fyodor Dostoevsky  Notes from Underground


In [69]:
# Get all books written by Fyodor Dostoevsky
Kafka_books = df.loc[df["Author"] == "Franz Kafka"]
print(Kafka_books)

        Author               Book
7  Franz Kafka  The Metamorphosis
8  Franz Kafka          The Trial
9  Franz Kafka         The Castle


In [70]:
## The append() function can be use to add more rows to the DataFrame
new_books = [
    {"Author": "Oscar Wilde", "Book": "The Picture of Dorian Gray"},
    {"Author": "Oscar Wilde", "Book": "The Importance of Being Earnest"},
    {"Author": "Oscar Wilde", "Book": "De Profundis"}
]

# Convert to DataFrame
oscar_df = pd.DataFrame(new_books)

df = pd.concat([df, oscar_df], ignore_index=True)
print(df)

               Author                             Book
0   Fyodor Dostoevsky             Crime and Punishment
1   Fyodor Dostoevsky           The Brothers Karamazov
2   Fyodor Dostoevsky                        The Idiot
3   Fyodor Dostoevsky           Notes from Underground
4         Leo Tolstoy                    War and Peace
5         Leo Tolstoy                    Anna Karenina
6         Leo Tolstoy         The Death of Ivan Ilyich
7         Franz Kafka                The Metamorphosis
8         Franz Kafka                        The Trial
9         Franz Kafka                       The Castle
10      Khalil Gibran                      The Prophet
11      Khalil Gibran                    Sand and Foam
12      Khalil Gibran                       The Madman
13      George Orwell                             1984
14      George Orwell                      Animal Farm
15      George Orwell              Homage to Catalonia
16      Anton Chekhov               The Cherry Orchard
17      An

### Importing and Exporting Data

In [73]:
#Data can be loaded into DataFrames from input data stored in the CSV format using the read_csv() function
table_csv = pd.read_csv('../input/Cars2015.csv')

# Data present in DataFrames can be written to a CSV file using the to_csv() function
# If the specified path doesn't exist, a file of the same name is automatically created
table_csv.to_csv('newcars2015.csv')



# Data can be loaded into DataFrames from input data stored in the Excelsheet format using read_excel()
sheet = pd.read_excel('cars2015.xlsx')


# Data present in DataFrames can be written to a spreadsheet file using to_excel()
#If the specified path doesn't exist, a file of the same name is automatically created
sheet.to_excel('newcars2015.xlsx')

FileNotFoundError: [Errno 2] No such file or directory: '../input/Cars2015.csv'