---
# Introduction to Python

Python is a high-level, interpreted programming language known for its simplicity and readability. It is widely used in web development, data analysis, artificial intelligence, automation, and more.

Some key features of Python:
- Easy to learn and write
- Dynamically typed
- Large standard library
- Supports multiple programming paradigms (procedural, object-oriented, functional)

In this assignment, we will explore basic Python concepts using examples like:
- Lists and sets
- Type conversion
- Adding and updating data
- Handling errors


### Topic 01:  Printing "Hello, World!" using the `print()` function

The `print()` function in Python is used to display output on the screen.  
It sends the specified message or value to the standard output .


In [1]:
print ("hello world")

hello world


### Variables 

A variable is like a container , that stores data in your program.
In Python, variables don't need explicit declaration to occupy memory space.
The type of the variable will be decided after assigning the value to it.

In [4]:
counter = 100.00
miles = 100
name = "Hamza"
print(counter)
print(miles)
print(name)

100.0
100
Hamza


### Topic 2: Numbers

Python can work with whole numbers (integers) and decimal numbers (floats).  
You can do math using operators like `+`, `-`, `*`, `/`, `**`.

**What each line does**:
- `2` → Just prints the number
- `2 + 3` → Adds numbers = 5
- `5 - 2` → Subtracts = 3
- `10 * 2` → Multiplies = 20
- `10 / 2` → Divides = 5.0
- `2 ** 3` → 2 raised to power 3 = 8


### What the Code Shows

- `int_num = 10` → This is an integer (whole number)
- `float_num = 3.14` → This is a float (decimal number)
- `2 ** 3` → Exponentiation: 2 raised to the power 3 = 8


In [3]:
print(2)            # Integer
print(2 + 3)        # Addition
print(5 - 2)        # Subtraction
print(10 * 2)       # Multiplication
print(10 / 2)       # Division
print(2 ** 3)       # Power (2^3)

2
5
3
20
5.0
8


### Topic 3: Strings 

* String data types are used to store words or combination of words having letters, numbers, special characters etc.
* It can be stored by enclosing within single qoutes and double qoutes also.
* Python doesn't support char data type. It will be as String of length one in Python.


In [1]:
name= "Babar" 
University = "Nust"
print(name,University)

Babar Nust


### Accessing Values in a String

In Python, we can access specific characters in a string using **indexing**.

- Indexing starts from **0**
- `string[0]` → gives the **first character**
- `string[1:4]` → gives a **range of characters** from index 1 to 3 (not including 4)


In [12]:
name = "Hamza"
print(name[0])     # H
print(name[1:4])   # amz

H
amz


###  Updating a String 

Strings in Python are **immutable**, so we cannot change individual characters directly.  
However, we can **create a new string** by slicing the old one and adding new text.


In [2]:
stringVariable = "Hello programmer"
stringVariable = stringVariable[:6] + "Python"
print("Updated string:", stringVariable)

Updated string: Hello Python


###  Deleting a String Variable in Python

You can delete an entire string variable using the `del` keyword.

####  Example:
```python
stringVariable = "Hello World"
del stringVariable
print(stringVariable)  # This will give a NameError because you have deleted the string so it is not defined anymore 


In [14]:
stringVariable = "Hello World"
print(stringVariable)  # Output: Hello World

del stringVariable     # Deletes the entire variable

print(stringVariable)  #  This will cause an error


Hello World


NameError: name 'stringVariable' is not defined

###  String Special Operators in Python

Python supports special operators to work with strings:

####  Examples:

- `+` → **Concatenation**: Joins two strings

"Hello" + "Hamza" → "HelloHamza"


In [3]:
variable = "hey"
print ( variable + 'python')

heypython


***used for repitation**
"Hi" * 3 → "HiHiHi"


In [4]:
variable = "hello"
print(variable *3)

hellohellohello


**[]** **give the character of string at given index**

In [17]:
variable = 'hello'
print(variable [1])

e


**[:]** **give the character of string at given index.
is used to get a range of characters from string**

In [18]:
variable = 'hello'
print (variable [1:3])

el


in 
**in function check if a substring exists**

In [20]:
variable= 'hello'
print('h'in variable)

True


In [21]:
variable ='hey'
print('z' in variable)

False


###  String Formatting Operator (%)

The `%` symbol is used to insert values into a string using placeholders.

####  Example:
```python
name = "Hamza"
age = 18
print("My name is %s and I am %d years old." % (name, age))


In [22]:
# Using % formatting
name = "Hamza"
age = 18

print("My name is %s and I am %d years old." % (name, age))


My name is Hamza and I am 18 years old.


###  Topic 4: Lists

A **list** is a group of items stored in a single variable.  
* Lists are written using square brackets `[ ]` and can hold different data types.
* Multiple types of data can be stored in list .
* List indices starts from 0 just like arrays.


####  Example:
my_list = ['math', 99, 'physics']


In [5]:
list1 = ['Imran Khan','1992']
print(list1)

['Imran Khan', '1992']


### Accessing Values in a List

* We use **index numbers** to access list items.
* Square brackets are used to access the values of a list.
* We can access a specific values or range of values from list.

#### Example:

subjects = ['math', 'physics', 'chemistry', 'biology']


In [6]:
list1 = ['geography', 'chemistry', 1997, 2000];
list2 = [1, 2, 3, 4, 5, 6, 7 ];
print ("list1[0]: ", list1[0])
print ("list2[1:5]: ", list2[1:5])

list1[0]:  geography
list2[1:5]:  [2, 3, 4, 5]


###  Updating List Items

* Lists in Python are **mutable**, so we can change their values.
* We can update single or mupltiple elements of a list by giving slice on left hand of the assignment operator.
* It can also be updated using append() function.

**Explaination** :
* subjects[0] = 'biology' → changes the first item

* subjects[-1] = 'computer' → changes the last item

* The list becomes: ['biology', 'physics', 'computer']

####  Example:

subjects = ['math', 'physics', 'chemistry']
subjects[0] = 'biology'
subjects[-1] = 'computer'


In [2]:
# Original list
subjects = ['math', 'physics', 'chemistry']

# Update the first item
subjects[0] = 'biology'

# Update the last item using negative index
subjects[-1] = 'computer'

print(subjects)  # Output: ['biology', 'physics', 'computer']


['biology', 'physics', 'computer']


###  Deleting Items from a List

There are 3 main ways to delete items from a list:

* del subjects[1] → Deletes item at index 1 → removes 'physics'

* subjects.remove('chemistry') → Removes the item with that value

* subjects.pop() → Removes the last item (returns 'biology')
####  Note :

* del is used with index

* remove() is used with value

* pop() removes the last item by default (can also use index)

In [3]:
subjects = ['math', 'physics', 'chemistry', 'biology']

# Using del
del subjects[1]  # Deletes 'physics'

# Using remove()
subjects.remove('chemistry')  # Removes item by value

# Using pop()
last_subject = subjects.pop()  # Removes last item and stores it

print(subjects)        # Output: ['math']
print(last_subject)    # Output: biology


['math']
biology


###  Topic 5: Dictionaries

Dictionaries in Python store **data as key-value pairs**.  
They are written using `{}` and each item is like `"key": value`.

####  Example:
student = {
    "name": "Hamza",
    "age": 18,
    "course": "AI Bootcamp"
}


In [7]:
# Creating a dictionary
student = {
    "name": "Hamza",
    "age": 18,
    "course": "AI Bootcamp"
}

### Accessing Values in a Dictionary

* We can access dictionary values using their **keys**.
* We can use the familiar square brackets along with the key to obtain its value. Following is a simple example


In [7]:
dict = {'Name':"Mehvish", 'Depart': "BSSE", 'Batch': 2014}
print (dict['Depart'])

BSSE


###  Updating a Dictionary

* Dictionaries are **mutable**, so we can change or add values using keys.
* We can update already existing value in dictionary.
* We can also add a new entry in dictionary.




In [8]:
dict = {'Name':"Shaheen", 'Depart': "BSCS", 'Batch': 2010}
print (dict)
dict['Batch'] = "SP14"
print (dict)
dict['University'] = "Nust"
print(dict)

{'Name': 'Shaheen', 'Depart': 'BSCS', 'Batch': 2010}
{'Name': 'Shaheen', 'Depart': 'BSCS', 'Batch': 'SP14'}
{'Name': 'Shaheen', 'Depart': 'BSCS', 'Batch': 'SP14', 'University': 'Nust'}


###  Deleting Items from a Dictionary

* You can remove individual key-value pairs or even the entire dictionary.
* We can delete individual element of dicitionary and complete content of dictionary.
* del is used for individual element removal and clear() function is used to remove entire dictionary.

####  Methods:

- `del dict["key"]` → Deletes a specific key
- `dict.pop("key")` → Removes the key and returns its value
- `dict.clear()` → Removes all key-value pairs (empties the dictionary)
- `del dict` → Deletes the dictionary completely

In [9]:
my_dict = {'Name': "Naseem", 'Depart': "BCAI", 'Batch': 2019}
print(my_dict)

del my_dict['Batch']
print(my_dict)


{'Name': 'Naseem', 'Depart': 'BCAI', 'Batch': 2019}
{'Name': 'Naseem', 'Depart': 'BCAI'}


In [14]:
my_dict = {'Name': "Mehvish", 'Depart': "BCS", 'Batch': 2014}
my_dict.clear()
print(my_dict)


{}


###  Topic 6: Tuples

* Tuples are like lists but **immutable** (they cannot be changed after creation).  
* Empty tuple is written using **round brackets `( )`**.
* To write a tuple containing a single value, you have to include a comma (,) even though there is only one value. For Example: tup = (40,).
* It also has multiple data type values.
* Like string indices, tuple indices start at zero, and they can be sliced and so on 
####  Example:

my_tuple = ('math', 'physics', 99, 99)


In [1]:
tup = ('Math','98', 'C programming', '99')
print (tup)
print(tup[1])
print(tup[1:3])

('Math', '98', 'C programming', '99')
98
('98', 'C programming')


###  "Updating" a Tuple

Tuples are **immutable**, so you cannot change their items directly.

*  This will cause an error:

my_tuple = ('math', 'physics')
my_tuple[0] = 'biology'  


In [3]:
my_tuple = (1, 2, 3)
my_tuple[1] = 10  # Trying to update the second item


TypeError: 'tuple' object does not support item assignment

* **instead you can convert into list , update and than convert it back to tuple**

In [4]:
temp_list = list(my_tuple)
temp_list[0] = 'biology'
my_tuple = tuple(temp_list)

print("Updated tuple:", my_tuple)

Updated tuple: ('biology', 2, 3)


###  Deleting a Tuple

* Tuples are **immutable**, so we cannot delete individual items.

* But we can delete the **entire tuple** using the `del` keyword.

####  Example:

my_tuple = ('math', 'physics', 'chemistry')
del my_tuple


In [14]:
tup = ('physics', 'chemistry', 1997, 2000);
print (tup);
del tup
print (tup);

('physics', 'chemistry', 1997, 2000)


NameError: name 'tup' is not defined

###  Topic: Sets in Python

* A **set** is an unordered collection of **unique** items.  
* Sets do **not allow duplicates** and do **not support indexing**.

####  Explaination :
* Every element in **Sets** is immutable and unique . It means no duplicates and no element can be changed.

* sets itself are mutable.

* Empty sets are written as **{}**

* Duplicates are automatically removed → 'math' appears only once

* Each item in set will be **comma (,)** seperated.

* We can make a set from a list using **set() function**.

* Data type can be found using **type() function**.

* `my_set.add('biology')` → Adds a new item

* `my_set.remove('physics')` → Removes the item

* `chemistry' in my_set` → Checks if item exists

* `len(my_set)` → Tells the number of items

* `discard() and remove()` are used to to delete particular item from set.

* `discard()` will not raise an error if item doesn't exists in set.

* `remove()` will raise an error if item doesn't exists in set.


In [20]:
 #list
list1 = [1,2,3,4,5]
print (type(list1))
my_set = set(list1)
print(my_set) 
print(type(my_set))
#set of integers
my_set = {1,2,3}
print (my_set)
#set of mixed data types
my_set = {1,"Hello", 1.2,'C'}
#adding a single value
my_set.add('D')
#adding multiple values
my_set.update(list1)
print (my_set)
my_set.discard('G')
my_set.remove('G') # it will raise an error because it dose'nt consist in the set

<class 'list'>
{1, 2, 3, 4, 5}
<class 'set'>
{1, 2, 3}
{1, 1.2, 2, 'C', 3, 4, 5, 'D', 'Hello'}


KeyError: 'G'

### Topic 7: Comparison Operators in Python

* Comparison operators are used to compare two values (strings or numbers).  
* They return `True` or `False` depending on the result.

####  Example:
```python
a = 10
b = 5


In [21]:
a = 10
b = 5

print(a == b)   
print(a != b)   
print(a > b)    
print(a < b)    
print(a >= b)   
print(a <= b)   


False
True
True
False
True
False


---

#  Control Flow in Python :

Control flow helps the program **decide what to do next** based on conditions.  
It includes:
- `if`, `elif`, `else` statements for decisions
- `for` and `while` loops for repeating actions

We’ll explore each one with examples.

---


### Topic 1: If-Else Statements

* `if` and `else`  statments are used to control the flow of a program.

* `if` block if the codition is true otherwise `else` block 

###  Example:

age = 18

if age < 18:
    print("You are underage.")
elif age == 18:
    print("You just became an adult!")
else:
    print("You are an adult.")


In [24]:
age = 23
if age < 18:
    print('You are underage')
else:
    print('You are an adult')


You are an adult


###  Topic 2: Loops in python: 

* Loops in python let you repeat action without writting code multiplte times

### For Loop in Python:

A `for` loop is used to **repeat a block of code** for every item in a list, string, or range.

### While loop in pyton:

A `while` loop in python runs **as long as the condition is true**



In [31]:
#example of for loop
fact = 1
N = 5
for i in range(1, N+1):
    fact = fact * i
print("Factorial of", N, "is", fact)



Factorial of 5 is 120


In [32]:
# example of while loop 
i = 1
while i <= 5:
    print(i)
    i += 1


1
2
3
4
5


---

#  Functional Programming in Python

In Python, **functions** are reusable blocks of code that perform a specific task.  
This section covers:
- Defining and calling functions
- Using anonymous functions (`lambda`)
- Applying functions to collections with `map()` and `filter()`

---
### The syntax is:

*  def functionname( parameters ):
* "function_docstring"
* function_suite
* return [expression]

In [34]:
# Define a function
def greet(name):
    print("Hello", name)

# Call the function
greet("Hamza")
greet("Kashif")


Hello Hamza
Hello Kashif


### Topic 1: Lambda Functions

* A `lambda` function is a small, anonymous function with **no name**.  
* It can take any number of arguments but has only **one expression**.
* Lambda function doesn't include return statement, it always contains an expression which is returned.
* This piece of code shows the difference between a normal function definition ("f") and a lambda function ("g"):



In [38]:
times3 = lambda var:var*3 # lambda function 
times3(10)                # lambda expression: another way to write lambda function 

30

###  Topic 2 : Map() in Python

* The `map()` function applies a function to **every item** in a list.

* Map() function is used with two arguments. Just like: r = map(func, seq)

* The first argument func is the name of a function and the second a sequence (e.g. a list).

* Seq. map() applies the function func to all the elements of the sequence seq. It returns a new list with the elements changed by func.




In [39]:
# List of numbers
nums = [1, 2, 3, 4, 5]

# Using map with a lambda to square each number
squares = list(map(lambda x: x**2, nums))
print("Squares:", squares)  # Output: [1, 4, 9, 16, 25]


Squares: [1, 4, 9, 16, 25]


###  Filters in Python

* The function filter(function, list) offers an elegant way to filter out all the elements of a list.

* The function filter(f,l) needs a function f as its first argument. f returns a Boolean value, i.e. either True or False.

* This function will be applied to every element of the list l.

* Only if f returns True will the element of the list be included in the result list.




In [40]:
# List of numbers
nums = [1, 2, 3, 4, 5, 6]

# Using filter to keep even numbers only
evens = list(filter(lambda x: x % 2 == 0, nums))

print("Even numbers:", evens)  


Even numbers: [2, 4, 6]


---

#  File I/O in Python

This section covers how to:
- Open and read text files
- Write data into files
- Work with file positions using `seek()` and `tell()`
- Use the `os` module for file system operations

---


###  Reading Input from Keyboard

* Python uses the `input()` function to take input from the user.

* For reading input from keyboard, **raw_input()** mehod is used .

* It reads only one line from standard input and returns it as a string.



In [2]:
 from six.moves import input
string = input("Enter your name: ");
print(string)

Enter your name:  hamza 


hamza 


###  I/O from and to Text Files

Python lets you **store and read data** from `.txt` files using the `open()` function.

####  Explaination

* `"w"` → write mode (creates or overwrites file)

* `"r"` → read mode

* `write()` → writes text to file

* `read()` → reads full file content

* `for` line in file: → reads line by line

* `strip()` → removes newline \n from each line

* `a `opens a file in append mode

* `a+` opens a file in append and read mode.


In [23]:
# Open a file to read and write
fileOpen = open("file.txt", "r+")

# To read specific content from start you can use read(12). It will read 12 characters from the start of the file
content = fileOpen.read()  # or use fileOpen.read(12) to read first 12 characters
print(content)

# Close opened file
fileOpen.close()



Name: Hamza Kashif
Department: BSCS Information Technology Lahore



In [26]:
# Open a file to append
fileOpen = open("file.txt", "a+")
fileOpen.write("Information Technology Lahore\n")  # \n for a new line
fileOpen.close()

# Open the file in read mode
fileOpen = open("file.txt", "r+")

# To read specific content from the start, use read(12) to read first 12 characters
content = fileOpen.read()
print("File content:\n", content)

# Close the opened file
fileOpen.close()


File content:
 Name: Hamza Kashif
Department: BSCS Information Technology Lahore
Information Technology Lahore



###  File  Position – `tell()` and `seek()`

Python lets you **check and change** where the file is being read from using:

- `tell()` → shows the current file pointer position
- `seek(n)` → method changes the current file location
#### Example:

file = open("pointer_demo.txt", "r")

* file.tell()        # Shows current byte position

* file.seek(0)       # Moves pointer to beginning


In [27]:
# Open a file
fo = open("file.txt", "r+")
str = fo.read(10);
print("ReadStringis:\n",str)
# Check current position
position = fo.tell();
print("Currentfileposition:\n", position)
#Repositionpointeratthebeginning once again
position = fo.seek(0, 0);
str = fo.read(10);
print("AgainreadStringis:\n", str)
# Close opend file
fo.close()


ReadStringis:
 Name: Hamz
Currentfileposition:
 10
AgainreadStringis:
 Name: Hamz


In [28]:
import os
# renameafile
os.rename("file.txt","newfile.txt")

In [29]:
#remove file
os.remove("newfile.txt")

---

# Introduction to Pandas

Pandas is a powerful Python library built on top of NumPY . Its used for:
- Data analysis
- Data cleaning
- Working with tabular data (like Excel or CSV)
- It also has built-in visualization features
- It can work with data from a wide variety of sources

In this section, we’ll learn:
- What is a `Series`
- What is a `DataFrame`
- How to create and manipulate them

---


---

# Series :
- Series is very similar to NumPY array .

-  A Series is like a column in Excel or a table.

- It is a one-dimensional list-like structure.

- It can hold any type of data — numbers, text, etc.

- It comes with labels (indexes) by default.

- It is part of the Pandas library (used for data analysis).

### Topic 1 : from ndarray :

- ndarray stands for N-dimensional array.

- It comes from the NumPy library (used for numerical computing in Python).

- It is used to store large collections of numbers in a grid or table-like structure.

- It can be 1D, 2D, 3D, or even higher dimensions.

- It is faster and more memory efficient than Python lists.

### Explaination : 

- If data is an ndarray, index must be the same length as **data**

- If no index is passed, one will be created having values

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

"""Following a function from pandas to create series.
Data would be 5 random values and indexes are assigned a-e."""
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s


a   -0.191725
b   -0.196255
c    1.409578
d    0.774689
e   -0.406107
dtype: float64

In [6]:
# The following fuction will print the index and its data type
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [7]:
"""If we don't assign the index then it will of length having values [0......len(data-1)"""
pd.Series(np.random.randn(5))

0   -1.616634
1   -0.741171
2    0.166439
3    0.824701
4   -1.308730
dtype: float64

### Topic 2: Creating a Pandas Series from a Dictionary

- A dictionary contains key–value pairs.
- When passed into `pd.Series()`, the **keys become indexes**, and **values become data**.

Example:
```python
data = {'a': 10, 'b': 20, 'c': 30}
s = pd.Series(data)


In [10]:
# in following example , index are not given to it is constructed from sorted key of dictionary .
d = {'a' : 0., 'b': 1., 'c' : 2.} # a python dic 
s= pd.series(d)
print (s)

AttributeError: module 'pandas' has no attribute 'series'

**The  error is shown because we did'nt import pandas . This is correct form of code**

In [11]:
import pandas as pd

d = {'a': 0., 'b': 1., 'c': 2.}  # A Python dictionary
s = pd.Series(d)
print(s)


a    0.0
b    1.0
c    2.0
dtype: float64


In [19]:
#""In following example, index are given, so the values in data corresponding in the index will be pulled out"""

pd.Series(d, index=['b', 'c', 'd', 'a'])

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64

###  Topic 3: Creating a Series from a Scalar Value

- A scalar value is a single number or string.
- If we provide a scalar and a list of indexes, Pandas repeats the value for each index.

Example:
```python
pd.Series(100, index=['a', 'b', 'c', 'd'])


In [20]:
# in the following example , a scalar value is given as a data so it will be repeated to match the lenth of the index

pd.Series(5., index=['a','b','c','d','e'])

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

### Topic 4: Series is ndarray-like

A Pandas Series behaves like a NumPy ndarray:

- You can access elements using indexes like `s[0]`
- You can slice it using `s[1:4]`
- You can apply conditions like `s > 0`

These operations are fast and efficient due to the underlying NumPy array.


In [26]:
#we can access a value just like ndarray
#access single value
s[0]


np.float64(3.14)

In [29]:
#access range of values
s[:5]

a   -0.093998
b   -0.468539
c   -1.296773
d   -0.493565
e   -0.208271
dtype: float64

In [30]:
""" Following example will return a range of values in series whose value is greater than the median of series"""

s[s > s.median()]

a   -0.093998
e   -0.208271
dtype: float64

In [33]:
"""Following example is return the values in series with indexes. 4,3,1 are the positions of the indexs
For example: the index at 4,3,1 are e,d,b respectively"""
s.iloc[[4, 3, 1]] # for newer version of pandas 


e   -0.208271
d   -0.493565
b   -0.468539
dtype: float64

In [34]:
""" Following example returns the exponent values. just like e^a (here a is index and its respective data is
placed here)"""
np.exp(s)

a    0.910284
b    0.625916
c    0.273413
d    0.610446
e    0.811987
dtype: float64

In [35]:
"""Following example will get the data of given index"""
s['a']


np.float64(-0.0939982751344544)

In [36]:
"""Following example will update the data of the given index"""
s['e'] = 12.
s 

a    -0.093998
b    -0.468539
c    -1.296773
d    -0.493565
e    12.000000
dtype: float64

In [37]:
""" Following will return true if 'e' is in the values of index otherwise false"""
'e' in s


True

In [38]:
"""If a label is not contained and you are trying to access its data, an exception is raised: """
s['f']
# This will create error

KeyError: 'f'

In [39]:
"""Using the get method, a missing label will return None or specified default"""
s.get('f') #it will return none
s.get('f', np.nan) #it will return default value

nan

### Topic 5: Vectorized Operations & Label Alignment in Series

####  Vectorized Operation:
You can perform operations like `+`, `-`, `*`, `/` on entire Series — no loop needed.

####  Label Alignment:
When two Series have **different indexes**, Pandas **matches by label**, not by position.

####  Example:

s1 = pd.Series([10, 20, 30], index=["a", "b", "c"])
s2 = pd.Series([1, 2, 3], index=["b", "c", "d"])
result = s1 + s2


In [40]:
"""following will add the data of respective values of indexes. For example, in given output, it is calculate
d as: a = s['a'] +
s['a'] b = s['b'] +
s['b'] c = s['c'] +
s['c'] d = s['d'] +
s['d'] e = s['e'] +
s['e']
"""
s + s

a    -0.187997
b    -0.937078
c    -2.593546
d    -0.987129
e    24.000000
dtype: float64

In [41]:
"""following will multiply the data of each values of indexes, with 2. For example, in given output, it is calculated as:
a = s['a'] *2
b = s['b'] *2
c = s['c'] *2
d = s['d'] *2
e = s['e'] *2"""
s * 2

a    -0.187997
b    -0.937078
c    -2.593546
d    -0.987129
e    24.000000
dtype: float64

In [42]:
 s = pd.Series(np.random.randn(5), name='something')
s

0    0.538428
1   -0.378590
2   -1.585063
3    1.367985
4    1.320546
Name: something, dtype: float64

In [43]:
# This will print the name attribute of series
s.name

'something'

In [44]:
#rename the series name attribute and assign to s2 object. Note that s and s2 refer to different objects.
s2 = s.rename("different")
s2.name

'different'

---

# What is a DataFrame?

A **DataFrame** in Pandas is a powerful data structure that:

-  Is **2-dimensional** (rows and columns)
-  Stores **tabular data** (like Excel or CSV)
-  Is built from **Series objects** (each column is a Series)
-  Each column can have a **different data type**
-  Comes with **automatic indexing** (0, 1, 2, …)
-  Supports many data sources (dictionaries, lists, arrays, CSV files, etc.)
-  Provides easy tools to **filter, analyze, and clean** data
-  Ideal for **real-world datasets** like surveys, logs, student records, etc.

> In short: **A DataFrame = spreadsheet in code.**


###  Topic 1 : Creating DataFrame from Dict of Series or Dicts

You can build a DataFrame using:

#### Dictionary of Series:
* Each **Series becomes a column**, and their **indexes become rows**.
* The result index will be the union of the indexes of the various Series.
* If there are any nested dicts, these will be first converted to Series.
* If no columns are passed, the columns will be the sorted list of dict keys.
### Example :

* name_series = pd.Series(["Hamza", "Ali", "Sara"], index=[101, 102, 103])
* age_series = pd.Series([23, 25, 22], index=[101, 102, 103])
* df = pd.DataFrame({"Name": name_series, "Age": age_series})


In [45]:
""" A dict is created """
d = {
'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])
}
"""create a dataframe. row label will be the indes of a series. As coloum labels are not given so it
will be sorted list of dict keys"""
df=pd.DataFrame(d)
df

Unnamed: 0,one,two
a,1.0,1.0
b,2.0,2.0
c,3.0,3.0
d,,4.0


In [5]:
import pandas as pd

# A dict is created
d = {
    'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
    'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
}

# create a dataframe, row label will be the index of a series
df = pd.DataFrame(d)

# a data frame will be constructed for given row labels
df = pd.DataFrame(d, index=['d', 'b', 'a'])

print(df)

   one  two
d  NaN    4
b  2.0    2
a  1.0    1


In [6]:
"""following example shows a data frame when we give coloumn labels"""
pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])


Unnamed: 0,two,three
d,4,
b,2,
a,1,


In [7]:
df.columns

Index(['one', 'two'], dtype='object')

### Topic 2: Creating DataFrame from Dictionary of Lists / Arrays

* This is the most common way to create a DataFrame.

*If an index is passed, it must clearly also be the same length as the arrays.

*If no index is passed, the result will be range(n), where n is the array length.

####  Format:

data = {
   
    "Name": ["Hamza", "Ali", "Sara"],
    "Age": [23, 25, 22],
    "City": ["Lahore", "Karachi", "Islamabad"]
}
df = pd.DataFrame(data)


In [8]:
import pandas as pd

# Using dictionary of lists
data = {
    "Name": ["Hamza", "Ali", "Sara"],
    "Age": [23, 25, 22],
    "City": ["Lahore", "Karachi", "Islamabad"]
}

df = pd.DataFrame(data)
print(df)


    Name  Age       City
0  Hamza   23     Lahore
1    Ali   25    Karachi
2   Sara   22  Islamabad


In [12]:
# """column labels are not given sothe result will be range(n), where n is the array length"""

import pandas as pd

# List of lists without column labels
data = [
    ["Hamza", 23, "Lahore"],
    ["Ali", 25, "Karachi"],
    ["Sara", 22, "Islamabad"]
]

df = pd.DataFrame(data)

print(df)


       0   1          2
0  Hamza  23     Lahore
1    Ali  25    Karachi
2   Sara  22  Islamabad


In [13]:
#"""If indexs are given then it would be same length as arrays"""

import pandas as pd

# Dictionary of equal-length lists
data = {
    "Name": ["Hamza", "Ali", "Sara"],
    "Age": [23, 25, 22]
}

# Custom index of same length (3 items)
index_labels = ["a", "b", "c"]

df = pd.DataFrame(data, index=index_labels)
print(df)


    Name  Age
a  Hamza   23
b    Ali   25
c   Sara   22


###  Topic 3 : Creating DataFrame from List of Dictionaries

You can pass a **list of dictionaries** to create a DataFrame.  
Each dictionary becomes a **row**, and keys become **column names**.

####  Example:

data = [
 
    {"Name": "Hamza", "Age": 23, "City": "Lahore"},
    {"Name": "Ali", "Age": 25},
    {"Name": "Sara", "Age": 22, "City": "Islamabad"}
]
df = pd.DataFrame(data)


In [3]:
"""constructing data frame from a list of dicts"""

import pandas as pd

# Constructing DataFrame from list of dictionaries
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data2)

print(df)


   a   b     c
0  1   2   NaN
1  5  10  20.0


In [4]:
"""passing list of dicts as data and indexes (row labels)"""
pd.DataFrame(data2, index=['first', 'second'])

Unnamed: 0,a,b,c
first,1,2,
second,5,10,20.0


In [5]:
"""passing list of dicts as data and columns (columns labels)"""
pd.DataFrame(data2, columns=['a', 'b'])

Unnamed: 0,a,b
0,1,2
1,5,10


###  Topic 4 : Creating DataFrame from Dict of Tuples

* There are two ways this works depending on structure.

- You can create a DataFrame from a **dictionary where each value is a tuple**.

- The behavior of the DataFrame depends on the structure of the dictionary.

- You can automatically create a multi-indexed frame by passing a tuples dictionary

In [6]:
pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})
#"NaN shows missing data

Unnamed: 0_level_0,Unnamed: 1_level_0,a,a,a,b,b
Unnamed: 0_level_1,Unnamed: 1_level_1,b,a,c,a,b
A,B,1.0,4.0,5.0,8.0,10.0
A,C,2.0,3.0,6.0,7.0,
A,D,,,,,9.0


### Topic 5 : Alternate Constructors 

* Alternate constructors are **special methods** used to create DataFrames from structured data sources.

**Data Frames from_Dict** :

* **Data Frame from dicts** take a dict if array-like sequence and returns it to data frame.

* It operates like Data Frame constructor , except for the orient parameter which is 'columns' by default, but which can be set to 'index' in order to use the dict key as row lables.

* **Data frame from_records** :

* **DataFrame.from_records** takes a list of tuples or an ndarray with structured dtype.

* Works analogously to the normal DataFrame constructor, except that index maybe be a specific field of the structured dtype to use as the
index. 



In [9]:
import numpy as np

# Use 'S' instead of deprecated 'a'
data = np.zeros((2,), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'S10')])
print(data)


[(0, 0., b'') (0, 0., b'')]


In [10]:
pd.DataFrame.from_records(data, index='C')

Unnamed: 0_level_0,A,B
C,Unnamed: 1_level_1,Unnamed: 2_level_1
b'',0,0.0
b'',0,0.0


**DataFrame.from_items**:

* **DataFrame.from_itemsworks** analogously to the form of the dict constructor that takes a sequence of (key, value) pairs, where the keys are
column (or row, in the case of orient='index') names, and the value are the column values (or row values).

* This can be useful for constructing a DataFrame with the columns in a particular order without having to pass an explicit list of columns

In [14]:
from collections import OrderedDict
import pandas as pd

df = pd.DataFrame(OrderedDict([('A', [4, 5, 6]), ('B', [7, 8, 9])]))
print(df)


   A  B
0  4  7
1  5  8
2  6  9


In [16]:
from collections import OrderedDict
import pandas as pd

data = OrderedDict([
    ('A', [4, 5, 6]),
    ('B', [7, 8, 9])
])

df = pd.DataFrame.from_dict(data, orient='index', columns=['one', 'two', 'three'])
print(df)


   one  two  three
A    4    5      6
B    7    8      9


### Topic 6:  Column Selection, Addition, and Deletion :

* DataFrame can be treated semantically like a dict of like-indexed Series objects. Getting, setting, and deleting columns works with the same
syntax as the analogous dict operations .

####  Selection
- `df['Name']` → selects one column
- `df[['Name', 'Marks']]` → selects multiple columns

####  Addition
- `df['Grade'] = ['A', 'A+', 'B']` → adds new column
- `df['Bonus'] = df['Marks'] + 5` → adds calculated column

####  Deletion
- `del df['Bonus']` → deletes column in-place
- `df = df.drop('Grade', axis=1)` → deletes column and returns new DataFrame


In [17]:
df['one'] # it is displaying data under coloumn 'one'

A    4
B    7
Name: one, dtype: int64

In [18]:
df['three'] = df['one'] * df['two'] # assigning values to a colomn named 'three' after calculation

In [19]:
df['flag'] = df['one'] > 2 #check if value at column 'one' is > 2 then assign True otherwise false

In [20]:
df #print a complete data frame

Unnamed: 0,one,two,three,flag
A,4,5,20,True
B,7,8,56,True


* **Columuns** can be deleted or popped like with a dict :

In [21]:
del df['two'] #delete a coloumn 'two' from data frame


In [22]:
three = df.pop('three') #pop a complete coloumn 'three' from dataframe

In [23]:
df

Unnamed: 0,one,flag
A,4,True
B,7,True


* When inserting a scalar value, it will naturally be propagated to fill the column:

In [24]:
df['foo'] = 'bar' #a coloumn 'foo' will be populated with 'bar'

In [25]:
df

Unnamed: 0,one,flag,foo
A,4,True,bar
B,7,True,bar


* When inserting a Series that does not have the same index as the DataFrame, it will be conformed to the DataFrame’s index:

In [26]:
"""following example will take values from coloumn one until give range and will populate the new coloumn"""
df['one_trunc'] = df['one'][:2]

In [27]:
df

Unnamed: 0,one,flag,foo,one_trunc
A,4,True,bar,4
B,7,True,bar,7


* By default, columns get inserted at the end. The insert function is available to insert at a particular location in the columns:

In [28]:
"""following function has three arguments.
First argument: index where new coloumn will be inserted.
Second argument: label or title of a new coloumn
Third argument: it will create a coloumn at specified position"""
df.insert(1, 'bar2', df['one'])

In [29]:
df

Unnamed: 0,one,bar2,flag,foo,one_trunc
A,4,4,True,bar,4
B,7,7,True,bar,7


###  Topic 7: Indexing and Selection:

Row selection, for example, returns a Series whose index is the columns of the DataFrame.

####  Access Columns
- `df['Name']` → Single column
- `df[['Name', 'Marks']]` → Multiple columns

####  Access Rows
- `df.loc['a']` → Access by label
- `df.iloc[0]` → Access by position

####  Access Specific Cell
- `df.loc['a', 'Marks']` → By label
- `df.iloc[0, 1]` → By position

####  Slice Rows
- `df[0:2]` → First two rows
- `df.loc['a':'b']` → Rows from 'a' to 'b'


In [33]:
df.loc['B'] #it will return the coloumn labels and values on row label 'b'

one             7
bar2            7
flag         True
foo           bar
one_trunc       7
Name: B, dtype: object

In [35]:
 df.iloc[1] #it will return the values of those coloumns that is > than 2

one             7
bar2            7
flag         True
foo           bar
one_trunc       7
Name: B, dtype: object

###  Topic 8: Data Alignment & Arithmetic in DataFrames

* Pandas aligns **both rows and columns by label** when performing arithmetic operations between DataFrames.
* Data alignment between DataFrame objects automatically align on both the columns and the index (row labels).
* Again, the resulting object will have the union of the column and row labels.


In [36]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

In [37]:
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

In [38]:
df + df2 # add values of respective coloumn labels

Unnamed: 0,A,B,C,D
0,0.571587,0.242891,1.098381,
1,-0.589415,-2.069065,0.511396,
2,-1.460503,0.640814,0.983695,
3,1.845109,0.988191,-0.47376,
4,0.109613,-1.48735,0.459597,
5,1.95452,0.417492,1.789018,
6,1.38917,-3.059659,-1.352074,
7,,,,
8,,,,
9,,,,


* When doing an operation between DataFrame and Series, the default behavior is to align the Series index on the DataFrame columns. For Example:

In [39]:
df - df.iloc[0]

Unnamed: 0,A,B,C,D
0,0.0,0.0,0.0,0.0
1,-2.18115,-0.808237,2.130582,-1.617624
2,-2.195594,-0.983505,0.986787,1.429948
3,-1.504762,-1.174795,2.258893,-0.454176
4,-0.933641,-1.304956,2.525425,0.436265
5,0.391504,-1.257995,2.469039,-1.048214
6,-1.386845,-2.576266,-0.158712,-0.515423
7,-1.232739,-1.274511,0.900509,-1.149099
8,-2.301157,-2.954864,2.36449,0.721083
9,-1.300159,-1.440829,0.996095,0.433566


In [40]:
df*5+2

Unnamed: 0,A,B,C,D
0,9.882689,8.06697,-6.200534,3.967827
1,-1.023059,4.025783,4.452378,-4.120291
2,-1.095283,3.149445,-1.266599,11.117566
3,2.358881,2.192995,5.093929,1.696946
4,5.214485,1.542188,6.42659,6.149149
5,11.840207,1.776992,6.144662,-1.273242
6,2.948465,-4.81436,-6.994093,1.390712
7,3.718996,1.694413,-1.697988,-1.777669
8,-1.623095,-6.707351,5.621916,7.573242
9,3.381893,0.862823,-1.220059,6.135658


In [41]:
1 / df

Unnamed: 0,A,B,C,D
0,0.634301,0.824135,-0.609716,2.540874
1,-1.653954,2.468181,2.038837,-0.816955
2,-1.615361,4.349926,-1.530644,0.548392
3,13.93218,25.907353,1.616068,-16.498714
4,1.555459,-10.92151,1.129538,1.205066
5,0.508119,-22.420743,1.206371,-1.527537
6,5.271676,-0.733745,-0.55592,-8.206297
7,2.908674,-16.361963,-1.352087,-1.323567
8,-1.380035,-0.574227,1.380485,0.897144
9,3.618225,-4.396851,-1.552767,1.208997


In [42]:
df **4

Unnamed: 0,A,B,C,D
0,6.177568,2.16774,7.235833,0.023992
1,0.133631,0.026946,0.057872,2.244959
2,0.146866,0.002793,0.182181,11.056959
3,2.7e-05,2e-06,0.146609,1.3e-05
4,0.17083,7e-05,0.614324,0.474194
5,15.00158,4e-06,0.472146,0.183668
6,0.001295,3.450009,10.470065,0.000221
7,0.013971,1.4e-05,0.299214,0.325848
8,0.275701,9.197381,0.275343,1.54366
9,0.005835,0.002676,0.172018,0.468057


* Boolean operators work as well:

In [43]:
df1 = pd.DataFrame({'a' : [1, 0, 1], 'b' : [0, 1, 1] }, dtype=bool)

In [44]:
df2 = pd.DataFrame({'a' : [0, 1, 1], 'b' : [1, 1, 0] }, dtype=bool)

In [45]:
pd.DataFrame({'a' : [0, 1, 1], 'b' : [1, 1, 0] }, dtype=bool)

Unnamed: 0,a,b
0,False,True
1,True,True
2,True,False


In [46]:
df1 & df2 #and logical operator

Unnamed: 0,a,b
0,False,False
1,False,True
2,True,False


In [47]:
df1 | df2 # or operator

Unnamed: 0,a,b
0,True,True
1,True,True
2,True,True


In [48]:
-df1

Unnamed: 0,a,b
0,False,True
1,True,False
2,False,False


###  Topic 9: Transposing a DataFrame

To **transpose** a DataFrame (flip rows ↔ columns), use `.T`:
It is similar to  an ndarray

####  Example:
```python
df.T


In [49]:
# only show the first 6 rows
df[:6].T

Unnamed: 0,0,1,2,3,4,5
A,1.576538,-0.604612,-0.619057,0.071776,0.642897,1.968041
B,1.213394,0.405157,0.229889,0.038599,-0.091562,-0.044602
C,-1.640107,0.490476,-0.65332,0.618786,0.885318,0.828932
D,0.393565,-1.224058,1.823513,-0.060611,0.82983,-0.654648


* Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:

In [52]:
dates = pd.date_range('20160101', periods=6)

In [53]:
dates

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', freq='D')

In [54]:
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))


In [55]:
df

Unnamed: 0,A,B,C,D
2016-01-01,-0.673657,1.606244,-0.525905,-0.347189
2016-01-02,1.382708,0.267742,-1.322686,-1.003976
2016-01-03,-0.177106,0.330812,-0.852325,-1.010773
2016-01-04,-0.626326,-0.03305,0.649773,-2.402037
2016-01-05,0.440802,-2.449788,-1.11957,-0.715579
2016-01-06,-0.044679,-0.77699,0.311611,-2.012203


* Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [56]:
df2 = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20160102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })

In [57]:
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2016-01-02,1.0,3,test,foo
1,1.0,2016-01-02,1.0,3,train,foo
2,1.0,2016-01-02,1.0,3,test,foo
3,1.0,2016-01-02,1.0,3,train,foo


In [59]:
#Having specific dtypes
df2.dtypes

A          float64
B    datetime64[s]
C          float32
D            int32
E         category
F           object
dtype: object

---
# Viewing Data :

Viewing data means inspecting, exploring, and summarizing the contents of a Pandas DataFrame — especially at the start of your analysis.

It helps you:

* Understand what your dataset looks like.

* Check for missing or incorrect data.

* Explore patterns and distributions.

* Confirm that data is loaded correctly.

### Key Tools :
1. `df.head()` → View first 5 rows  
2. `df.tail()` → View last 5 rows  
3. `df.columns` → List all column names  
4. `df.shape` → Show shape (rows, cols)  
5. `df.index` → Show row index range  
6. `df.dtypes` → Data types of each column  
7. `df.info()` → Summary: columns, nulls, types  
8. `df.describe()` → Stats for numeric columns  
9. `df["col"].value_counts()` → Count of unique values  
10. `df["col"].unique()` → Unique values in a column  
11. `df["col"].nunique()` → Number of unique values  
12. `df.isnull().sum()` → Missing values per column  
13. `df[0:3]` → Slice first 3 rows  
14. `df[["col1", "col2"]]` → View selected columns  
15. `df[df["col"] > value]` → Filter rows conditionally




In [4]:
import pandas as pd

# Sample DataFrame
data = {
    "Name": ["Hamza", "Ali", "Sara", "Ahmed", "Usman", "Adeel"],
    "Age": [23, 25, 22, 24, 26, 28],
    "City": ["Lahore", "Karachi", "Islamabad", "Lahore", "Peshawar", "Multan"]
}

df = pd.DataFrame(data)

# View first 5 rows
print("First 5 rows:\n", df.head())

# View last 5 rows
print("\nLast 5 rows:\n", df.tail())

# View first 3 rows
print("\nFirst 3 rows:\n", df.head(3))

# View column names
print("\nColumn Names:\n", df.columns)

# Get basic info (types, nulls, etc.)
print("\nInfo:\n")
df.info()

# Get descriptive stats
print("\nDescriptive Statistics:\n", df.describe())


First 5 rows:
     Name  Age       City
0  Hamza   23     Lahore
1    Ali   25    Karachi
2   Sara   22  Islamabad
3  Ahmed   24     Lahore
4  Usman   26   Peshawar

Last 5 rows:
     Name  Age       City
1    Ali   25    Karachi
2   Sara   22  Islamabad
3  Ahmed   24     Lahore
4  Usman   26   Peshawar
5  Adeel   28     Multan

First 3 rows:
     Name  Age       City
0  Hamza   23     Lahore
1    Ali   25    Karachi
2   Sara   22  Islamabad

Column Names:
 Index(['Name', 'Age', 'City'], dtype='object')

Info:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    6 non-null      object
 1   Age     6 non-null      int64 
 2   City    6 non-null      object
dtypes: int64(1), object(2)
memory usage: 276.0+ bytes

Descriptive Statistics:
              Age
count   6.000000
mean   24.666667
std     2.160247
min    22.000000
25%    23.250000
50%    24.500000
75%

In [6]:
df.tail(3) #display last 3 records


Unnamed: 0,A,B,C,D
2013-01-04,0.700818,0.86887,-0.345051,-0.022051
2013-01-05,0.254263,1.334793,-1.945413,0.015426
2013-01-06,-0.626666,0.377629,0.328876,0.118453


In [7]:
df.index #display indexes

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [8]:
df.columns #display coloumns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [9]:
df.values # print values

array([[ 0.9441851 , -1.13927235, -0.48422804,  1.3854395 ],
       [-0.97136305,  1.02900877, -1.00140222, -0.09428462],
       [ 0.28498348,  1.36390556,  0.57735353,  0.31031256],
       [ 0.70081807,  0.86886984, -0.34505057, -0.02205082],
       [ 0.25426297,  1.33479294, -1.94541297,  0.01542579],
       [-0.62666645,  0.37762879,  0.3288756 ,  0.11845257]])

In [10]:
# Transposing your data
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,0.944185,-0.971363,0.284983,0.700818,0.254263,-0.626666
B,-1.139272,1.029009,1.363906,0.86887,1.334793,0.377629
C,-0.484228,-1.001402,0.577354,-0.345051,-1.945413,0.328876
D,1.38544,-0.094285,0.310313,-0.022051,0.015426,0.118453


In [11]:
#Sorting by an axis
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,1.38544,-0.484228,-1.139272,0.944185
2013-01-02,-0.094285,-1.001402,1.029009,-0.971363
2013-01-03,0.310313,0.577354,1.363906,0.284983
2013-01-04,-0.022051,-0.345051,0.86887,0.700818
2013-01-05,0.015426,-1.945413,1.334793,0.254263
2013-01-06,0.118453,0.328876,0.377629,-0.626666


In [12]:
#Sorting by values
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2013-01-01,0.944185,-1.139272,-0.484228,1.38544
2013-01-06,-0.626666,0.377629,0.328876,0.118453
2013-01-04,0.700818,0.86887,-0.345051,-0.022051
2013-01-02,-0.971363,1.029009,-1.001402,-0.094285
2013-01-05,0.254263,1.334793,-1.945413,0.015426
2013-01-03,0.284983,1.363906,0.577354,0.310313


In [13]:
# Describe shows a quick statistic summary of your data
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,0.097703,0.639156,-0.478311,0.285549
std,0.74933,0.942882,0.917582,0.556804
min,-0.971363,-1.139272,-1.945413,-0.094285
25%,-0.406434,0.500439,-0.872109,-0.012682
50%,0.269623,0.948939,-0.414639,0.066939
75%,0.596859,1.258347,0.160394,0.262348
max,0.944185,1.363906,0.577354,1.38544


In [14]:
#Selecting a single column, which yields a Series, equivalent to df.A
df['A']


2013-01-01    0.944185
2013-01-02   -0.971363
2013-01-03    0.284983
2013-01-04    0.700818
2013-01-05    0.254263
2013-01-06   -0.626666
Freq: D, Name: A, dtype: float64

In [15]:
#Selecting a single column, which yields a Series, equivalent to df.A
df['A']


2013-01-01    0.944185
2013-01-02   -0.971363
2013-01-03    0.284983
2013-01-04    0.700818
2013-01-05    0.254263
2013-01-06   -0.626666
Freq: D, Name: A, dtype: float64

In [16]:
df['20130102':'20130104']

Unnamed: 0,A,B,C,D
2013-01-02,-0.971363,1.029009,-1.001402,-0.094285
2013-01-03,0.284983,1.363906,0.577354,0.310313
2013-01-04,0.700818,0.86887,-0.345051,-0.022051


In [17]:
#Selecting on a multi-axis by label
df.loc[:,['A','B']]


Unnamed: 0,A,B
2013-01-01,0.944185,-1.139272
2013-01-02,-0.971363,1.029009
2013-01-03,0.284983,1.363906
2013-01-04,0.700818,0.86887
2013-01-05,0.254263,1.334793
2013-01-06,-0.626666,0.377629


In [18]:
# Reduction in the dimensions of the returned object
df.loc['20130102',['A','B']]

A   -0.971363
B    1.029009
Name: 2013-01-02 00:00:00, dtype: float64

In [19]:
# For getting a scalar value
df.loc[dates[0],'A']

np.float64(0.9441851049655556)

In [20]:
# For getting fast access to a scalar
df.at[dates[0],'A']

np.float64(0.9441851049655556)

In [21]:
# Select via the position of the passed integers
df.iloc[3]


A    0.700818
B    0.868870
C   -0.345051
D   -0.022051
Name: 2013-01-04 00:00:00, dtype: float64

In [22]:
# By integer slices, acting similar to numpy/python
df.iloc[3:5,0:2]

Unnamed: 0,A,B
2013-01-04,0.700818,0.86887
2013-01-05,0.254263,1.334793


In [23]:
# By lists of integer position locations, similar to the numpy/python style
df.iloc[[1,2,4],[0,2]]


Unnamed: 0,A,C
2013-01-02,-0.971363,-1.001402
2013-01-03,0.284983,0.577354
2013-01-05,0.254263,-1.945413


In [24]:
# For slicing rows explicitly
df.iloc[:,1:3]

Unnamed: 0,B,C
2013-01-01,-1.139272,-0.484228
2013-01-02,1.029009,-1.001402
2013-01-03,1.363906,0.577354
2013-01-04,0.86887,-0.345051
2013-01-05,1.334793,-1.945413
2013-01-06,0.377629,0.328876


In [25]:
# For getting a value explicitly
df.iloc[1,1]


np.float64(1.0290087676244732)

In [26]:
# Using a single column’s values to select data.
df[df.A > 0]

Unnamed: 0,A,B,C,D
2013-01-01,0.944185,-1.139272,-0.484228,1.38544
2013-01-03,0.284983,1.363906,0.577354,0.310313
2013-01-04,0.700818,0.86887,-0.345051,-0.022051
2013-01-05,0.254263,1.334793,-1.945413,0.015426


In [27]:
# Selecting values from a DataFrame where a boolean condition is met.
df[df > 0]

Unnamed: 0,A,B,C,D
2013-01-01,0.944185,,,1.38544
2013-01-02,,1.029009,,
2013-01-03,0.284983,1.363906,0.577354,0.310313
2013-01-04,0.700818,0.86887,,
2013-01-05,0.254263,1.334793,,0.015426
2013-01-06,,0.377629,0.328876,0.118453


In [28]:
# Using the isin() method for filtering:
df2 = df.copy()

In [29]:
df2['E'] = ['one', 'one','two','three','four','three']

In [30]:
df2

Unnamed: 0,A,B,C,D,E
2013-01-01,0.944185,-1.139272,-0.484228,1.38544,one
2013-01-02,-0.971363,1.029009,-1.001402,-0.094285,one
2013-01-03,0.284983,1.363906,0.577354,0.310313,two
2013-01-04,0.700818,0.86887,-0.345051,-0.022051,three
2013-01-05,0.254263,1.334793,-1.945413,0.015426,four
2013-01-06,-0.626666,0.377629,0.328876,0.118453,three
