### Python Concepts Overview

Types of Programming Languages:
* Procedural: C and Almost All Langs
* Functional: Python, JavaScript
* Object Oriented: C++, C#, Java, Python ...

Static PL (C) and Dynamic PL (Python)

Memory Mangement:
* Stacks = Variables
* Heaps = (Variables Data) Stacks Points to Heaps

##### Python Concepts
* **Variables**: Containers that hold values.
* **Data Types**: Numbers, Strings, Lists, Tuples, Sets and Dictionaries.
* **Operators**: Arithmetic Operators (+, -, *, /, //, %, **), Comparision Operators (=, <, >, !) and Logical Operators (and, or, not).
* **Conditional Statements**: `if`, `elif`, `else`.
* **Loops**:
  - **for**: Loop a given number of times.
  - **while**: Loop until a condition is met.
  - **Loop Control Statements**: break (exists the loop), continue (skips current iteration and continues), pass (does nothing, placeholder)
* **Exception Handling**:
  - **try**: Attempt a block of code.
  - **except**: Run this if try fails.
  - **else**: Run this if no exception occurs.
  - **finally**: Run this no matter what.
* **Functions**: Reusing a block of code.
  - **Positional Arguments**: Arguments that must be passed in order.
  - **Variable-Length Arguments**: `*args` for positional arguments, `**kwargs` for keyword arguments.
* **Lambda Functions**: Anonymous, throwaway functions, mostly used for single-line operations.
* **OOP**:
  - **Classes**: Blueprints for creating objects.
  - **Methods**: Functions defined inside classes.
  - **Instances**: Objects created from classes.
* **Generators**: Functions that return an iterator using the `yield` statement, allowing iteration over values without storing them in memory.
* **Expressions**: Combinations of values and operators that produce a result.
* **Closures**: Functions that capture the local state of variables from their enclosing scope.
* **Regex**: Regular expressions used for string searching and manipulation.
* **Decorators**: Functions that modify the behavior of another function.
* **Iterators**: Objects that implement the iterator protocol, allowing traversal through a container.
* **Functional Programming**: A programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing state and mutable data.
* **Map, Reduce, Filter**: Higher-order functions for processing collections:
  - **Map**: Applies a function to all items in an iterable.
  - **Reduce**: Reduces an iterable to a single value by applying a binary function cumulatively.
  - **Filter**: Filters items in an iterable based on a function that returns True or False.
* **Threading**: Allows concurrent execution of code by running multiple threads (smaller units of a process).
* **Magic Methods**: Special methods in Python that begin and end with double underscores (e.g., `__init__`, `__str__`, `__len__`) that enable the use of operators with user-defined classes.


## History and Advantages

* **1990**: Python was created by Guido van Rossum (Python 3 released in 2008).
* **Easy Readability**: Python emphasizes code readability, making it easier to understand and maintain.
* **Easy to Use**: Python has a simple syntax that allows developers to write less code to accomplish tasks.
* **Why Python**: It optimizes developer time over computer processing time, focusing on productivity.
* **Case-Sensitive**: Python is case-sensitive, meaning variable names are distinct based on their letter casing.
* **Dynamically Typed**: Variables are determined at runtime.


## Object and Data Structure Basics:

| Object Type   | Sequence         | Ordered | Mutable | Duplicates      | Heterogeneous | Unique |
|---------------|------------------|---------|---------|------------------|---------------|--------|
| Numbers       | No               | No      | No      | No               | No            | No     |
| Strings       | Yes              | Yes     | No      | Yes              | No            | No     |
| Lists         | Yes              | Yes     | Yes     | Yes              | Yes           | No     |
| Dictionaries  | No               | No      | Yes     | No               | No            | Yes    |
| Sets          | No               | No      | Yes     | No               | No            | Yes    |
| Tuples        | Yes              | Yes     | No      | Yes              | Yes           | No     |
| Files         | No               | No      | Yes     | No               | No            | No     |
| Booleans      | No               | No      | No      | No               | No            | No     |

## Mutable vs Immutable

* **Mutable**:
  - Objects that can change after modification.
  - For example, if you create a list with 3 items, you can change the first item to something else. The memory location remains the same.

* **Immutable**:
  - Objects that cannot change after they are created.
  - For instance, once a string is created in a variable `x` and you modify it (e.g., changing letters to uppercase), the memory location before and after will not be the same.

#### Numbers

* **PEMDAS**: The order of operations in mathematics.
* **Order of Operations**:
  - **P**: Parentheses `()`
  - **E**: Exponents `**`
  - **M**: Multiplication `*`
  - **D**: Division `/`
  - **A**: Addition `+`
  - **S**: Subtraction `-`

* **Note**: Multiplication and division have the same priority, so the operation that appears first from left to right is performed first.


In [None]:
# Immutable
a = 5
print(id(a))  # Memory location of 5
a = a + 1
print(id(a))  # New memory location for 6

132702868324720
132702868324752


In [None]:
#### Number Operations in Python
print(2**3)   # Exponentiation: 2 raised to the power of 3 (Output: 8)
print(2*3)    # Multiplication: 2 times 3 (Output: 6)
print(6/3)    # Division: 6 divided by 3 (Output: 2.0, float)
print(5//3)   # Floor Division: 5 divided by 3, result rounded down (Output: 1, integer)
print(6+3)    # Addition: 6 plus 3 (Output: 9)
print(6-3)    # Subtraction: 6 minus 3 (Output: 3)

8
6
2.0
1
9
3


#### String

* **Ordered Sequences**: Strings are ordered collections of characters, allowing for **indexing** and **slicing** to access sub-sections of the string.
* **Immutable**: Strings cannot be changed after they are created. Any modification results in the creation of a new string.


In [None]:
# Immutable
s = "hello"
print(id(s))  # Memory location of "hello"
s = s + " world"
print(id(s))  # New memory location for "hello world"

132702441883952
132702411069040


In [None]:
# Indexing (Grabbing Single Character)
name = 'Vinay Achanta'
print(name[0])
print(name[-1])

# Slicing (Exclusive)(Grabbing Multiple Character or Sub-Section)(Start:Stop:Step)(Can Go In REVERSE order)
print(name[0:5:1])
print(name[:13])
print(name[8:])
print(name[0:13:2])
print(name[::-1]) # Reverse a String

V
a
Vinay
Vinay Achanta
hanta
VnyAhna
atnahcA yaniV


In [None]:
# String Built-in Functions
print(name)
print(name.upper())
print(name.lower())
print(name.split()) # Default split on White Space into a List
print(name.split('y'))

Vinay Achanta
VINAY ACHANTA
vinay achanta
['Vinay', 'Achanta']
['Vina', ' Achanta']


In [None]:
# .format() Method
score = 10
height = 1.8123456
print("Your Score is {}, your height is {}".format(score, height))
print("Your Score is {0}, your height is {1}".format(score, height))
print("Your Score is {a}, your height is {b}".format(a=score, b=height))
print("Your Score is {h:10.2f}".format(h=height)) # {value:width.precisionf}

# F-String
print(f"Your Score is {score}, your height is {height}")
print(f"Your height is {height:.2f}")

Your Score is 10, your height is 1.8123456
Your Score is 10, your height is 1.8123456
Your Score is 10, your height is 1.8123456
Your Score is       1.81
Your Score is 10, your height is 1.8123456
Your height is 1.81


#### Lists `[ ]` or `list()`

* **Ordered Sequences**: Lists are ordered collections, allowing for **indexing** and **slicing** to access sub-sections.
* **Heterogeneous**: Lists can hold a variety of object types, meaning they can contain different data types (e.g., integers, strings, lists).
* **Mutable**: Lists can be modified after creation, allowing you to change, add, or remove items.
* **Holds Duplicates**: Lists can contain duplicate elements.


In [None]:
# Mutable
lst = [1, 2, 3]
print(id(lst))  # Memory location of list
lst.append(4) # Appended 4 to existing list
print(id(lst))  # Memory location is the same after modification

132702411071232
132702411071232


In [None]:
# Indexing, Slicing and Concatination
my_list = [1, 'Vinay Achanta', 'A02395874', 4353745117]
my_list_2 = [2, 'Hari Chandana', 'A02396013', 4353635131]

print(len(my_list))
print(my_list[1])
print(my_list[-1])

print(my_list[1:])
print(my_list[0:3:2])
print(my_list[::-1])

new_list = my_list + my_list_2
print(new_list)

new_list[5] = 'Hari Chandana Kotnani'
print(new_list)

4
Vinay Achanta
4353745117
['Vinay Achanta', 'A02395874', 4353745117]
[1, 'A02395874']
[4353745117, 'A02395874', 'Vinay Achanta', 1]
[1, 'Vinay Achanta', 'A02395874', 4353745117, 2, 'Hari Chandana', 'A02396013', 4353635131]
[1, 'Vinay Achanta', 'A02395874', 4353745117, 2, 'Hari Chandana Kotnani', 'A02396013', 4353635131]


In [None]:
# Lists Built-in Functions (append(), remove(), extend(), insert(), pop(), sort(), min(), max(), reverse(), range())
new_list.append('Append_Test') # Append items at the end of the existing list
print(new_list)

original_list = [1,2,3,4] # Extend new lists to the existing lists at the end
additional_elements = [5,6,7,8]
original_list.extend(additional_elements)
print(original_list)

original_list.insert(5, 'After 5th Element') # Insert items at an index
print(original_list)

print(new_list.pop()) # Pop the last item
print(new_list.pop(0)) # Pop the indexed item
print(new_list)

list_sort = [1,5,2,6,3,7,4,8,9,10]

print(max(list_sort)) # Max

print(min(list_sort)) # Min

list_sort.sort() # Sort In-Place
print(list_sort)

print(sorted(list_sort)) # Sort Function Not In-Place

list_sort.reverse() # Reverse In-Place
print(list_sort)

range_list = list(range(30)) # Range generated items inside a list
print(range_list)
print(range_list[0::5]) # Step 5

['A02396013', 4353635131, 'Append_Test', 1, 2, 3, 4, 5, 'Append_Test']
[1, 2, 3, 4, 5, 6, 7, 8]
[1, 2, 3, 4, 5, 'After 5th Element', 6, 7, 8]
Append_Test
A02396013
[4353635131, 'Append_Test', 1, 2, 3, 4, 5]
10
1
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[0, 5, 10, 15, 20, 25]


#### Dictionaries `{ }` or `dict()`

* **Unordered Mappings**: Dictionaries are collections used for storing objects in key-value pairs.
* **Key:Value Pairs**: Each item in a dictionary consists of a key and its associated value, and dictionaries are mutable (modifiable).
* **No Sorting, Slicing, or Indexing**: Dictionaries cannot be sorted or sliced, and you cannot access items by index. They are ideal for retrieving objects by key rather than by location.
* **Unique Keys**: Each key in a dictionary must be unique; duplicate keys are not allowed.


In [None]:
# Mutable
d = {'a': 1, 'b': 2}
print(id(d))  # Memory location of dict
d['c'] = 3
print(id(d))  # Memory location is the same after modification

132701639955648
132701639955648


In [None]:
my_dict = {'Vinay':3.9, 'Chandana':3.9, 'Tbone':3.6, 'Fa2':3.7}

for i in my_dict.keys():
  print(i)

for i in my_dict.values():
  print(i)

for i in my_dict.items():
  print(i)

Vinay
Chandana
Tbone
Fa2
3.9
3.9
3.6
3.7
('Vinay', 3.9)
('Chandana', 3.9)
('Tbone', 3.6)
('Fa2', 3.7)


In [None]:
my_dict_1 = {'Apple':{'Color':'Red', 'Price':100}, 'Mango':{'Color':'Yellow', 'Price':200}}

print(my_dict_1['Apple'])
print(my_dict_1['Apple']['Color'])

my_dict_1['Grapes'] = {'Color':'Green', 'Price':'100'} # Adding New Pair
print(my_dict_1)

my_dict_1['Grapes']['Price'] = 10 # Overwriting Existing Pair
print(my_dict_1)

{'Color': 'Red', 'Price': 100}
Red
{'Apple': {'Color': 'Red', 'Price': 100}, 'Mango': {'Color': 'Yellow', 'Price': 200}, 'Grapes': {'Color': 'Green', 'Price': '100'}}
{'Apple': {'Color': 'Red', 'Price': 100}, 'Mango': {'Color': 'Yellow', 'Price': 200}, 'Grapes': {'Color': 'Green', 'Price': 10}}


#### Sets `{ }` or `set()`

* **Unordered Collection**: Sets are collections that do not maintain any order.
* **Unique Elements**: Sets only store unique elements; duplicate values are automatically removed.


In [None]:
# Mutable
s = {1, 2, 3}
print(id(s))  # Memory location of set
s.add(4)
print(id(s))  # Memory location is the same after modification

132702438492416
132702438492416


In [None]:
my_set = set()

my_set.add(1) # Add Number to Set
print(my_set)

my_set.add(2)
print(my_set)

my_set.add(2) # No Duplicates
print(my_set)

my_list = [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4] # Cast a List into Set to Make it Unique
my_set_1 = set(my_list)
print(my_set_1)

{1}
{1, 2}
{1, 2}
{1, 2, 3, 4}


#### Tuples `()` or `tuple()`

* **Similarity to Lists**: Tuples are similar to lists, but the key difference is that tuples are **immutable**.
* **Different Data Types**: Tuples can contain elements of different data types.
* **No Item Assignment**: Lists support item assignment (e.g., `my_list[0] = 'new'` works), while tuples do not (e.g., `my_tuple[0] = 'new'` results in an error).
* **Usage**: Use tuples when you want to ensure that your items do not get accidentally changed or reassigned.


In [None]:
# Immutable
tup = (1, 2, 3)
print(id(tup))  # Memory location of tuple
tup = tup + (4,)
print(id(tup))  # New memory location for modified tuple

132702411066880
132702845538192


In [None]:
my_tuple = (1,2,3,4,5,'Vinay', 1, 1, 1)
print(my_tuple[5])

Vinay


In [None]:
# Tuple Build-in Functions (count(), index())
print(my_tuple.count(1)) # Count
print(my_tuple.index(1)) # Index of First Occurance

4
0


In [None]:
# Tuple Unpacking
unpack = [(1,2), (3,4), (5,6), (7,8)]
for a,b in unpack:
  print(a)
  print('----')
  print(b)
  print('-------------')

1
----
2
-------------
3
----
4
-------------
5
----
6
-------------
7
----
8
-------------


#### Booleans

* **Definition**: Booleans represent one of two values: **True** or **False**.


In [None]:
type(True)

bool

#### Files

* **Built-In Functions**: Common file operations include:
  - `read()`: Reads the entire file.
  - `readlines()`: Reads the file and returns a list of lines.
  - `write()`: Writes data to the file.

* **Modes**:
  - **`r`**: Read-only mode.
  - **`w`**: Write-only mode (overwrites existing files or creates a new one).
  - **`a`**: Append-only mode (adds data to the end of the file).
  - **`r+`**: Reading and writing mode.
  - **`w+`**: Writing and reading mode (overwrites existing files or creates a new one).


In [None]:
with open('Internship.txt', mode='a') as my_file:
  my_file.write('\nTestWrite')

with open('Internship.txt', mode='r') as my_file:
  my_file = my_file.read()
  print(my_file)

•Strategic Data Analysis and Visualization: Championed the analysis and visualization of Study Abroad and Undergraduate programs data at Utah State University, leveraging Tableau to identify key trends and optimize engagement strategies. My innovative approach to data presentation directly contributed to a 30% surge in Study Abroad enrollments through improved promotional campaigns and precise targeting.
•Optimized Data Retrieval for Admissions: Revolutionized undergraduate admissions data retrieval by deploying customized queries in Terradotta Analytics, reducing extraction time from hours to minutes. The streamlined process not only accelerated workflow but also heightened the precision of visa student profile analysis. The swift and accurate data retrieval facilitated advanced visualizations and trend analysis, equipping the admissions team with critical insights for an enhanced strategic approach to international student recruitment.
•Website Development for Global Education: Spear

## Comparison Operators

* **Basic Operators**:
  - **`=`**: Assignment operator.
  - **`<`**: Less than.
  - **`>`**: Greater than.
  - **`!=`**: Not equal to.

* **Chained Comparison Operators**:
  - **`and`**: Returns True if both conditions are true.
  - **`or`**: Returns True if at least one condition is true.
  - **`not`**: Negates a condition (returns True if the condition is false).


<table class="table table-bordered">
<tr>
<th style="width:10%">Operator</th><th style="width:45%">Description</th><th>Example</th>
</tr>
<tr>
<td>==</td>
<td>If the values of two operands are equal, then the condition becomes true.</td>
<td> (a == b) is not true.</td>
</tr>
<tr>
<td>!=</td>
<td>If values of two operands are not equal, then condition becomes true.</td>
<td> (a != b) is true.</td>
</tr>
<tr>
<td>&gt;</td>
<td>If the value of left operand is greater than the value of right operand, then condition becomes true.</td>
<td> (a &gt; b) is not true.</td>
</tr>
<tr>
<td>&lt;</td>
<td>If the value of left operand is less than the value of right operand, then condition becomes true.</td>
<td> (a &lt; b) is true.</td>
</tr>
<tr>
<td>&gt;=</td>
<td>If the value of left operand is greater than or equal to the value of right operand, then condition becomes true.</td>
<td> (a &gt;= b) is not true. </td>
</tr>
<tr>
<td>&lt;=</td>
<td>If the value of left operand is less than or equal to the value of right operand, then condition becomes true.</td>
<td> (a &lt;= b) is true. </td>
</tr>
</table>

In [None]:
# Basic Operators
print(2==2)
print(2==1)
print('hi'=='hi')
print('Hi'=='hi') # Case Sensitive
print(2.0==2)

True
False
True
False
True


In [None]:
# Chained Comparision
print(1<2 and 2>3)
print(1<2 or 2>3)
print(not(1<2 or 2>3)) # 'Not' gives oppposite i.e. True or False

False
True
False


## Conditionals

* **`if`**: Executes a block of code if a specified condition is true.
* **`elif`**: Stands for "else if," allowing the checking of multiple conditions.
* **`else`**: Executes a block of code if none of the previous conditions are true.

In [None]:
# if, elif, else
chandana_weight = 69
# chandana_weight = 80 # Please Don't let me uncomment this when you reach USA, Thanks

if chandana_weight <= 69:
  print("Chandana is Hot and Sexy")
elif chandana_weight >= 70:
  print("Chandana is Fat and Obese")
else:
  print("She is Trash if she's not 69")

Chandana is Hot and Sexy


## Loops

* **For Loop**: Iterates over a sequence (like a list, tuple, or string) or other iterable objects.
* **While Loop**: Repeatedly executes a block of code as long as a specified condition is true.


In [None]:
# for loop
# itarable: Lists, Tuples, Strings, Sets, Dictionaries, Ranges, Enumerate --- not-itarable: Numbers, Booleans

# Lists
my_list = [1,2,3,4,5]
for i in my_list:
    print('List', i)

# Tuples
my_tuple = (1, 2, 3, 4)
for item in my_tuple:
    print('Tuple', item)

# Strings
my_string = "Vinay"
for char in my_string:
    print('String', char)

# Sets
my_set = {1, 2, 3, 4}
for item in my_set:
    print('Set', item)

# Dictionaries
my_dict = {'dict1': 1, 'dict2': 2, 'dict3': 3}
for key in my_dict.keys(): # Iterating over keys
    print('Dict Key', key)
for value in my_dict.values(): # Iterating over values
    print('Dict Value', value)
for key, value in my_dict.items(): # Iterating over items (key-value pairs)
    print('Dict Key Value', key, value)

# range() Start:Stop(Exclusive):Step
for i in range(0, 11, 2):
  print(i)

# Enumerate
my_list = ['a', 'b', 'c']
for index, value in enumerate(my_list):
  print(f'Enumerate - Index: {index} Value: {value}')

List 1
List 2
List 3
List 4
List 5
Tuple 1
Tuple 2
Tuple 3
Tuple 4
String V
String i
String n
String a
String y
Set 1
Set 2
Set 3
Set 4
Dict Key dict1
Dict Key dict2
Dict Key dict3
Dict Value 1
Dict Value 2
Dict Value 3
Dict Key Value dict1 1
Dict Key Value dict2 2
Dict Key Value dict3 3
Range 0
Range 1
Range 2
Range 3
Range 4
Enumerate - Index: 0 Value: a
Enumerate - Index: 1 Value: b
Enumerate - Index: 2 Value: c


In [None]:
# while loop
x = 0
while x <= 10:
  print(f'Value of x is {x}')
  x += 1

# Even Numbers using While
x = 0
while x <=21:
  if x % 2 == 0:
    print(x)
  x += 1
  continue

Value of x is 0
Value of x is 1
Value of x is 2
Value of x is 3
Value of x is 4
Value of x is 5
Value of x is 6
Value of x is 7
Value of x is 8
Value of x is 9
Value of x is 10
0
2
4
6
8
10
12
14
16
18
20


# Control Statements

* **Break**: Breaks out of the current closest enclosing loop.
* **Continue**: Skips the current iteration and goes to the top of the closest enclosing loop.
* **Pass**: Does nothing; serves as a placeholder and can be used to escape errors.


In [None]:
# Break
name = 'Vinay'
for i in name:
  if i=='a':
    break
  print(i)

print('---------')

# Continue
name = 'Vinay'
for i in name:
  if i=='a':
    continue
  print(i)

print('---------')

# Pass
for i in name:
  pass

V
i
n
---------
V
i
n
y
---------


##  List Comprehension

In [None]:
# List Comprehension: write small loops to generate lists/sets/tuples/dict in one line
# increase development speed
list1 = []
for i in 'vinay':
  list1.append(i)
print(list1)

list1 = [i for i in 'vinay'] # Easy Replacement
print(list1)

sqrroot = [i**2 for i in range(0,11)] # Srq Roots
print(sqrroot)

ie = [x if x%2==0 else 'odd' for x in range(0,11)] # if-else using List Comprehension
print(ie)

nl = [x * y for x in [2,3,4] for y in [2,10,100] ] # List Comprehension for nested loops
print(nl)

['v', 'i', 'n', 'a', 'y']
['v', 'i', 'n', 'a', 'y']
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
[0, 'odd', 2, 'odd', 4, 'odd', 6, 'odd', 8, 'odd', 10]
[4, 20, 200, 6, 30, 300, 8, 40, 400]


## Errors and Exception Handling:
* Error
* Exception
* try
* except
* finally

In [None]:
def divide_numbers(x, y):
    try:
        result = x / y
    except ZeroDivisionError:
        print("Error: Cannot divide by zero!")
    except TypeError:
        print("Error: Invalid data type for division!")
    else:
        print(f"The result of {x} divided by {y} is: {result}")
    finally:
        print("This block always executes, whether there's an exception or not.")

divide_numbers(10, 2)
print('--------')
divide_numbers(10, 0)  # This will trigger a ZeroDivisionError
print('--------')
divide_numbers("abc", 2)  # This will trigger a TypeError

The result of 10 divided by 2 is: 5.0
This block always executes, whether there's an exception or not.
--------
Error: Cannot divide by zero!
This block always executes, whether there's an exception or not.
--------
Error: Invalid data type for division!
This block always executes, whether there's an exception or not.


## Built-In Functions:
* map
* filter
* zip
* all and any
* complex

In [None]:
# Useful Operators in Python

# Zip
list1 = ['a','b','c','d']
list2 = ['e','f','g','h']
list3 = ['i','j','k','l']
for i, j, k in zip(list1,list2,list3):
  print(f'Zip {i}, {j}, {k}')

# In Operator
print(4 in [1,2,3])
print(3 in [1,2,3])
print('a' in 'Vinay')

d = {'a':1,'b':2,'c':3,}
print('c' in d.keys())
print(4 in d.values())

# Random Library

# Shuffle
from random import shuffle
list1 = [1,2,3,4,5,6]
shuffle(list1) # In-Place Function
list1

# Randint
from random import randint
randint(0,100)

Zip a, e, i
Zip b, f, j
Zip c, g, k
Zip d, h, l
False
True
True
True
False


77

## Methods and Functions:
* Methods
* Functions
* Labmda Expressions
* Nested Statements
* Scope: LEGB (Local, Exclusive, Global, Built-In)

## Methods vs Functions:

* **Object Association:** Functions are standalone and not tied to any specific object.
Methods are associated with objects and act on the data of those objects.

* **Syntax:** Functions are defined using the def keyword.
Methods are called on objects using the dot notation.

* **Invocation:** Functions are called independently of any object.
Methods are called on objects and operate on the data of those objects.

* **Example:** len() is a function that can be applied to various data types (strings, lists, etc.).
.append() is a method specific to lists, and it operates on the list it is called on.

In [None]:
# Functions (Function Name Snake Casing: Lower Case with Underscores)

# Print vs Return in Function
def print_result(num1=0, num2=0): # Default values as 0
    print(num1 + num2)
result = print_result(1,2)
print(result) # Doesn't save anything in results variable
type(result)

def return_result(num1, num2):
    return num1 + num2
result = return_result(1,2)
print(result) # Saves results in results variable
type(result)

3
None
3


56.5

In [None]:
# *args:arguments **kwargs:keywordarguments
def try_args(*args): # Takes in as many inputs as we want without defining them
  return sum(args) * 0.05
print(try_args(1000,2000,1000,1000))

def try_kwargs(**kwargs): # Uses Dictionary
  print(kwargs)
try_kwargs(fruit='Apple', veggie='Potato')

def try_args_kwargs(*args, **kwargs):
  print(args)
  print(kwargs)
  print(f"I want {args[0]} {kwargs['food']}")
try_args_kwargs(10,20,30,food='mango', animal='apple')

250.0
{'fruit': 'Apple', 'veggie': 'Potato'}
(10, 20, 30)
{'food': 'mango', 'animal': 'apple'}
I want 10 mango


In [None]:
# Map Function (Apply the Function to a every element in the List)
def square(num):
  return num**2
my_list = [2,3,4,5,6]
print(list(map(square, my_list))) # Map

# Filter Function (Filters based on the given Function Condition)
def even_check(num): # This function just gives true or false
  return num%2==0
print(list(filter(even_check, my_list))) # Filter

[4, 9, 16, 25, 36]
[2, 4, 6]


In [None]:
# Labmda Expressions (Anonymous Function, When intended to use a function once mostly, Saves Space)
print(list(map(lambda num: num ** 2, my_list))) # Map
print(list(filter(lambda num: num%2==0, my_list))) # Filter

names = ['Vinay', 'Chandana'] # Example
print(list(map(lambda x:x[0], names)))

[4, 9, 16, 25, 36]
[2, 4, 6]
['V', 'C']


## Object Oriented Programming (OOPS):
* Objects: Everything inside python is an objects. Like built-in functions, like type are objects in some class.
* Classes: Datastructure LIST is a class and functions like len, append, extend are methods/objects inside that class.
* Methods: Functions inside a class are called method.
* Inheritance
* Special Methods (str, len, del)

In [None]:
class NameOfClass(): # Class (Camel Casing)
  def __init__(self, param1, param2): # init is Special Method called automatically when class is initialized
    self.param1 = param1
    self.param2 = param2

  def some_method(self):
    # perform some action
    print(self.param1)

In [None]:
# OOPS Classes
class Cars():

  id = '0001' # Class Object Attribute (Like Global inside Classes)

  def __init__(self, brandname, carname, year, millage):
    self.brandname = brandname
    self.carname = carname
    self.year = year
    self.millage = millage

  def getinfo(self):
    print(f'The car name is {self.brandname} {self.carname}, Its made in {self.year} and gives a millage of {self.millage}.')

TataHarrier = Cars(brandname='Tata', carname='Harrier', year=2023, millage=15) # Instance of a class.
TataHarrier.getinfo()

The car name is Tata Harrier, Its made in 2023 and gives a millage of 15.


In [None]:
# Inheritance
class SportsCars(Cars):
  def __init__(self, brandname, carname, year, millage):
      super().__init__(brandname, carname, year, millage)
# OR
class SportsCars(Cars):
  def getinfo(self):
    print(f'The car name is {self.brandname} {self.carname}, Its made in {self.year} and gives a millage of {self.millage}.')

Ferrari812 = SportsCars(brandname='Ferrari', carname='812 SuperFast', year=2022, millage=5)
Ferrari812.getinfo()

The car name is Ferrari 812 SuperFast, Its made in 2022 and gives a millage of 5.


## Polymorphism
Polymorphism in a simple way means that different objects can be treated in a similar way even if they belong to different classes. There are two main types of polymorphism: method overloading and method overriding.

### Method Overloading (Compile-time Polymorphism):

* Imagine a chef who can make a dish with different ingredients. Depending on what's available, the chef can create the dish with just a few or many ingredients.
* Similarly, in programming, method overloading allows a function or method to work with different sets of parameters.

In [None]:
def make_dish(ingredient1, ingredient2=None, ingredient3=None):
    # Cooking logic based on available ingredients

### Method Overriding (Runtime Polymorphism):

* Think of different animals that can make sounds. A dog barks, a cat meows, and a bird chirps.
* Method overriding allows a subclass to provide a specific implementation for a method that is already defined in its superclass.

In [None]:
# Polymorphism
class Animal:
    def make_sound(self):
        pass  # Abstract method, to be overridden by subclasses
class Dog(Animal):
    def make_sound(self):
        return "Woof!"
class Cat(Animal):
    def make_sound(self):
        return "Meow!"

# Create instances of the child classes
dog_instance = Dog()
cat_instance = Cat()
# Call the make_sound method on instances
print(dog_instance.make_sound())  # Output: Woof!
print(cat_instance.make_sound())  # Output: Meow!


* When you call make_sound on an instance of Dog or Cat, each animal makes its own unique sound.

* In both cases, the idea is to handle different situations in a unified way, making code more flexible and easier to maintain.

In [None]:
# Challenge
class Account():
  def __init__(self,owner,balance):
    self.owner = owner
    self.balance = balance

  def deposit(self, bal):
    self.balance = self.balance + bal
    return f'Deposit Accepted. Total: {self.balance}'

  def withdraw(self, withd):
    if self.balance >= withd:
      self.balance = self.balance - withd
      return f'Withdrawl Accepted.  Total: {self.balance}'
    else:
      return f'Funds Unavailable'

  def __str__(self):
    return f'Account Owner: {self.owner}\nAccount Balance: {self.balance}'

acct1 = Account('Jose', 100)
print(acct1)

print(acct1.deposit(50))

print(acct1.withdraw(50))
print(acct1.withdraw(50))
print(acct1.withdraw(50))
print(acct1.withdraw(50))

Account Owner: Jose
Account Balance: 100
Deposit Accepted. Total: 150
Withdrawl Accepted.  Total: 100
Withdrawl Accepted.  Total: 50
Withdrawl Accepted.  Total: 0
Funds Unavailable


## Decorators

*
In Python, a decorator is a special type of function that can be used to modify the behavior of another function. Decorators are often used to extend or enhance the functionality of functions without modifying their actual code. They are applied using the @decorator syntax above the function definition.

In [None]:
# Define a basic decorator
def my_decorator(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

# Apply the decorator using the @ syntax
@my_decorator
def say_hello():
    print("Hello!")

# Call the decorated function
say_hello()

Something is happening before the function is called.
Hello!
Something is happening after the function is called.


In [None]:
import time

def time_logger(func):
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        end = time.perf_counter()
        print(f"Execution Time: {end - start:.6f} seconds")
        return result
    return wrapper

@time_logger
def twoSum(nums=None, target=170):
    nums = (lambda: list(range(1, 10001)))() if nums is None else nums  # Larger range
    results = []
    for i in range(0, len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                results.append([nums[i], nums[j]])
    return results

@time_logger          
def twoSumAll(nums=None, target=170):
    nums = (lambda: list(range(1, 10001)))() if nums is None else nums  # Larger range
    results = []  
    seen = set() 
    for val in nums:
        diff = target - val
        if diff in seen:
            results.append((diff, val))  
        seen.add(val) 
    return results if results else "No Pairs Found"

print("Brute Force:")
print(len(twoSum()))
print("Hash-Based:")
print(len(twoSumAll()))


## Python Generators (yeild):
* Generators are a special type of iterator in Python that allow you to iterate over a potentially large sequence of data without loading the entire sequence into memory at once. They are created using a function with the yield keyword. When the generator function is called, it returns a generator object, and the code inside the function is not executed immediately. Instead, the function is paused at the yield statement, and the yielded value is returned to the caller. The next time the generator's __next__() method is called, the function resumes execution from where it was paused until it encounters the next yield statement or reaches the end of the function.

In [None]:
def simple_generator():
    for i in range(0,6):
      yield i

# Using the generator
gen = simple_generator()
print(gen)

print(next(gen))  # Output: 1 It remembers previous output and provides and next output instead of keeping everything in the memory. (Memory Efficient)
print(next(gen))  # Output: 2
print(next(gen))  # Output: 3
print('-----')
for num in gen:
  print(num)

<generator object simple_generator at 0x7b147c35f760>
0
1
2
-----
3
4
5


## Advance Python Module
* Collections
* OS Module and Datetime
* Math and Random
* Python Debugger
* Timeit
* Regular Expressions

In [None]:
# Collections
from collections import Counter
my_list = [1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,4,4,'a','a','b']
c = Counter(my_list)
print(c)
count_words = 'How many times words repeat in this sentence sentence'
count_words_list = count_words.split()
c = Counter(count_words_list)
print(c)
print(c.most_common(2)) # most_common returns the most common values only

from collections import defaultdict # defaultdict helps to set a default value to a dictionary. Removes the KeyError value.
d = defaultdict(lambda: 'default') # Setting default value as 0
d['exist'] = 100
print(d['exist'])
d['notexist']

Counter({4: 10, 2: 5, 1: 4, 3: 4, 'a': 2, 'b': 1})
Counter({'sentence': 2, 'How': 1, 'many': 1, 'times': 1, 'words': 1, 'repeat': 1, 'in': 1, 'this': 1})
[('sentence', 2), ('How', 1)]
100


'default'

In [None]:
# OS Module
import os
print(os.getcwd())
print(os.listdir('/content'))

import shutil # Move, Delete (,) file, folders
shutil.move('source.txt', '/content/destination.txt')
shutil.unlink('/content/delete_file.txt')
shutil.rmdir('/content/delete_folder.txt')

for dirpath, dirnames, filenames in os.walk('/content/'): # traverse a directory and its subdirectories, retrieving information about the files and directories within
    print(f"Current Directory: {dirpath}")

/content
['.config', 'pk_img.jpg', 'sample_data']


In [None]:
# Datetime
import datetime
today = datetime.date.today()
print(today)
print(today.year)
print(today.month)

from datetime import datetime
print(datetime.today())

from datetime import date
date1 = date(2023,12,1)
date2 = date(2022,12,1)
print(date1 - date2)

2023-12-19
2023
12
2023-12-19 21:24:54.114579
365 days, 0:00:00


In [None]:
# Math
import math
print(math.sqrt(5))
print(math.floor(5.4))
print(math.ceil(5.4))
print(math.pi)
print(math.log(45, 10)) # 10 is Base
print(math.sin(90))

# Random
print('------')
import random
print(random.randint(0,100)) # Use random.seed(101) to get the same random list
my_list = [1,2,3,4,5,6,7,8,9]
print(random.choice(my_list))
print(random.choices(population=my_list, k=10)) # Sample with replacement. Numbers might repeat here
print(random.sample(population=my_list, k=9)) # Sample without replacement. This is something like shuffle
random.shuffle(my_list) # Shuffle
my_list

2.23606797749979
5
6
3.141592653589793
1.6532125137753435
0.8939966636005579
------
76
1
[1, 2, 8, 5, 1, 9, 9, 8, 4, 6]
[1, 9, 2, 5, 4, 8, 7, 3, 6]


[9, 4, 2, 7, 8, 5, 3, 1, 6]

In [None]:
# Regular Expressions
import re
text = "My Phone number is 435-375-5117. Call my Phone if you have any questions"
print('phone' in text)
print(re.search('Phone', text)) # search()
print(re.findall('Phone', text)) # finall()
for i in re.finditer('Phone', text): # finditer()
  print(i)

# A Digit: file_\d\d\d = file_111
# A Non Digit: \D\D\D = ABC
# Alphanumeric: \w-\w\w\w = A-b_1
# Non-Alphanumeric: \W\W\W = *-+
# White Spaces: a\sb\sc = a b c
# Non White Spaces \S\S\S\S = Yoyo

print('Characters', re.search(r'\d\d\d-\d\d\d-\d\d\d\d', text)) # search()
print('Characters', re.search(r'\d{3}-\d{3}-\d{4}', text)) # search()

print('Or', re.search(r'Phone|have', text)) # Or
print('Wildcard', re.findall(r'.ne', text)) # Wild Card

False
<re.Match object; span=(3, 8), match='Phone'>
['Phone', 'Phone']
<re.Match object; span=(3, 8), match='Phone'>
<re.Match object; span=(41, 46), match='Phone'>
Characters <re.Match object; span=(19, 31), match='435-375-5117'>
Characters <re.Match object; span=(19, 31), match='435-375-5117'>
Or <re.Match object; span=(3, 8), match='Phone'>
Wildcard ['one', 'one']


In [None]:
# Time
import time
start_time = time.time()
def fun1(n):
  return [str(num) for num in range(n)]
result = fun1(10000000)
end_time = time.time()
print(end_time-start_time)

start_time = time.time()
def fun2(n):
  return list(map(str,range(n)))
result = fun2(10000000)
end_time = time.time()
print(end_time-start_time)

# Timeit
import timeit
stmt = '''
fun1(100)
'''
setup = '''
def fun1(n):
  return [str(num) for num in range(n)]
'''
print(timeit.timeit(stmt,setup,number=10000))

stmt2 = '''
fun2(100)
'''
setup2 = '''
def fun2(n):
  return list(map(str,range(n)))
'''
print(timeit.timeit(stmt2,setup2,number=10000))

3.9610931873321533
1.8334031105041504
0.4081098449996716
0.11600154999996448


## Web Scraping

In [None]:
# pip install bs4
import requests
import bs4

result = requests.get('https://en.wikipedia.org/wiki/Pawan_Kalyan') # Request
soup = bs4.BeautifulSoup(result.text,'lxml') # Initialize BeautifulSoup
title = soup.select('title')[0].getText() # Select Content
print(title)
par = soup.select('p')
#print(par)
cls = soup.select('.mw-parser-output') # Class
#print(cls)
img = requests.get('https://upload.wikimedia.org/wikipedia/commons/thumb/1/15/PawanKalyan.jpg/150px-PawanKalyan.jpg') # Image
i = open('pk_img.jpg', 'wb')
i.write(img.content)
i.close()

Pawan Kalyan - Wikipedia


## Advance Python Objects and Data Structures

In [None]:
# Advance Numbers
print(hex(1))
print(bin(1))
print(pow(2,3)) # Power 2**3
print(round(3.71234, 2))

0x1
0b1
8
3.71


In [None]:
# Advance Strings
s = 'vinay achanta'
print(s.capitalize())
print(s.upper())
print(s.lower())
print(s.islower())
print(s.isupper())
print(s.isalpha())
print(s.count('a'))
print(s.find('a'))
print(s.split())
print(s.split('a'))

Vinay achanta
VINAY ACHANTA
vinay achanta
True
False
False
4
3
['vinay', 'achanta']
['vin', 'y ', 'ch', 'nt', '']


In [None]:
# Advance Lists
l = [1,2,3,4,5]
l.append(6)
print(l)
l.extend([7,8,9,1])
print(l)
print(l.count(1))
print(l.index(5))
l.insert(2,'Inserted') # insert(index, object)
print(l)
l.pop(0) # Removes first letter/number of the list
print(l)
l.remove('Inserted')
print(l)
l.reverse()
print(l)

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 1]
2
4
[1, 2, 'Inserted', 3, 4, 5, 6, 7, 8, 9, 1]
[2, 'Inserted', 3, 4, 5, 6, 7, 8, 9, 1]
[2, 3, 4, 5, 6, 7, 8, 9, 1]
[1, 9, 8, 7, 6, 5, 4, 3, 2]


In [None]:
# Advance Sets
s = set()
s.add(1)
s.add(1) # Duplicate
s.add(2)
print(s)
s.clear()
print(s)
s = {1,2,3,1,1} # Copy
sc = s.copy()
print('Copy', sc)

s1 = {1,2,3}
s2 = {1,4,5}
s1.intersection(s2)
print('Intersection: ',s1)
s1.difference_update(s2)
print('Difference Update: ',s1)

{1, 2}
set()
Copy {1, 2, 3}
Intersection:  {1, 2, 3}
Difference Update:  {2, 3}


In [None]:
# Advance Dictionaries
d = {'k1':1, 'k2':2}
print(d)
print({x:x**2 for x in range(10)}) # Dictionary Comprehension


{'k1': 1, 'k2': 2}
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}


AttributeError: ignored

In [None]:
# Functions:
import math
math.sqrt(2)

txt = "50800"
txt.isdigit() # Check if all the characters in the text are digits

text.replace(' ', '')

''.join(my_list) # Joins Items in List

abs(-7.25) # Absolute Value of a Number abs(100 - 10), abs (10 - 100) i.e. 90

display(row1, row2, row3) # Easy way to print 3 rows, instead of using 3 print statements

7.25

## Transformations

Numpy: NumPy (Numerical Python) is a popular Python library used for numerical and scientific computing. It provides support for multidimensional arrays (known as ndarray) (1d - just x axis, 2d - x and y axis, 3d - x, y, z axis), and a collection of mathematical functions to operate on these arrays efficiently.

Key Features:
* Efficient array operations: NumPy arrays (ndarray) are more efficient and faster compared to Python lists for numerical computations.
* Vectorized operations: It supports element-wise operations, allowing you to apply mathematical operations on entire arrays without needing loops.
* Mathematical functions: Provides a wide range of functions like linear algebra, statistics, Fourier transforms, and random number generation.

What NumPy is Mostly Used for:
* Data Manipulation: Manipulating and analyzing numerical data in the form of arrays (1D, 2D, etc.). You can efficiently perform operations like slicing, indexing, and reshaping arrays.

* Linear Algebra and Matrix Operations: NumPy has built-in support for matrix operations such as matrix multiplication, dot product, and matrix inversion.

* Data Preparation for Machine Learning: NumPy is often used for preprocessing data before feeding it into machine learning models. Many machine learning libraries, like TensorFlow or Scikit-learn, use NumPy arrays as input data.

* Numerical Analysis: It handles numerical operations like solving equations, mathematical computations, and simulations more efficiently.

In [None]:
import numpy as np

# Creating a 2D array (matrix)
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Perform element-wise multiplication
result = data * 2

print("Original array:")
print(data)

print("\nArray after multiplication by 2:")
print(result)

# Calculate the mean of the entire array
mean_value = np.mean(data)
print("\nMean value of the array:", mean_value)

Original array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Array after multiplication by 2:
[[ 2  4  6]
 [ 8 10 12]
 [14 16 18]]

Mean value of the array: 5.0


Pandas is a widely-used Python library designed for data manipulation and analysis, especially for structured data. It builds on NumPy, providing more advanced data structures and tools specifically for working with tabular or labeled data (like spreadsheets or databases).

Key Features:

Data Structures:

* Series: A one-dimensional labeled array capable of holding any data type (like a column in a table).
DataFrame: A two-dimensional, mutable, labeled data structure (like a table or spreadsheet) where each column can have a different data type.
Data Manipulation:

* Pandas makes it easy to clean, filter, group, merge, reshape, and aggregate data.
It supports operations like joining tables, pivoting, reshaping, and grouping to perform analysis.
Handling Missing Data: Built-in methods for detecting, filling, and removing missing values.

* Data Import and Export: Pandas supports reading data from many file formats such as CSV, Excel, SQL databases, and JSON, and writing data back to these formats after processing.

What Pandas is Mostly Used for:

* Data Wrangling and Cleaning: Pandas is the go-to library for cleaning messy data, such as removing missing values, filtering rows, changing column data types, and restructuring the data.

* Exploratory Data Analysis (EDA): Pandas allows you to quickly explore the data to compute summary statistics, visualizations, and get an overview before further analysis.

* Time Series Analysis: Pandas has robust support for time-based data, with tools for handling date ranges, timestamps, and frequency conversions.

* Data Transformation: Pandas makes it easy to transform your data by adding new columns, applying functions, filtering, grouping, or aggregating values.

* Data Merging: You can combine multiple datasets using methods like merge, join, or concatenate, similar to SQL joins.

In [None]:
# Pandas has below

# Series
# Index - for faster lookups
# Dataframe

In [None]:
import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [24, 27, 22, 32, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 60000, 90000, 75000]
}

df = pd.DataFrame(data)

# Display the DataFrame
print("Original DataFrame:")
print(df)

# Filtering rows where Age > 25
filtered_df = df[df['Age'] > 25]

# Grouping by City and calculating the mean salary
grouped_df = df.groupby('City')['Salary'].mean()

# Adding a new column with 10% bonus on Salary
df['Salary with Bonus'] = df['Salary'] * 1.1

print("\nFiltered DataFrame (Age > 25):")
print(filtered_df)

print("\nGrouped by City (Mean Salary):")
print(grouped_df)

print("\nDataFrame with 'Salary with Bonus':")
print(df)

Original DataFrame:
      Name  Age         City  Salary
0    Alice   24     New York   70000
1      Bob   27  Los Angeles   80000
2  Charlie   22      Chicago   60000
3    David   32      Houston   90000
4      Eva   29      Phoenix   75000

Filtered DataFrame (Age > 25):
    Name  Age         City  Salary
1    Bob   27  Los Angeles   80000
3  David   32      Houston   90000
4    Eva   29      Phoenix   75000

Grouped by City (Mean Salary):
City
Chicago        60000.0
Houston        90000.0
Los Angeles    80000.0
New York       70000.0
Phoenix        75000.0
Name: Salary, dtype: float64

DataFrame with 'Salary with Bonus':
      Name  Age         City  Salary  Salary with Bonus
0    Alice   24     New York   70000            77000.0
1      Bob   27  Los Angeles   80000            88000.0
2  Charlie   22      Chicago   60000            66000.0
3    David   32      Houston   90000            99000.0
4      Eva   29      Phoenix   75000            82500.0


In [None]:
import pandas as pd
import numpy as np
from datetime import datetime

In [None]:
# Create a small DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [24, 27, 22, 32, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 60000, 90000, 75000],
    'Hire Date': [
        datetime(2020, 5, 1),  # Alice's hire date
        datetime(2019, 7, 15), # Bob's hire date
        datetime(2021, 1, 20), # Charlie's hire date
        datetime(2018, 3, 30), # David's hire date
        datetime(2022, 10, 10) # Eva's hire date
    ]
}

In [None]:
df = pd.DataFrame(data)
print(type(df))
df.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Name,Age,City,Salary,Hire Date
0,Alice,24,New York,70000,2020-05-01
1,Bob,27,Los Angeles,80000,2019-07-15
2,Charlie,22,Chicago,60000,2021-01-20
3,David,32,Houston,90000,2018-03-30
4,Eva,29,Phoenix,75000,2022-10-10


In [None]:
print(df.shape)
print(df.index)
print(df['Age'].describe()) # Get to know about table or column

(5, 5)
RangeIndex(start=0, stop=5, step=1)
count     5.000000
mean     26.800000
std       3.962323
min      22.000000
25%      24.000000
50%      27.000000
75%      29.000000
max      32.000000
Name: Age, dtype: float64


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Name       5 non-null      object        
 1   Age        5 non-null      int64         
 2   City       5 non-null      object        
 3   Salary     5 non-null      int64         
 4   Hire Date  5 non-null      datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(2)
memory usage: 328.0+ bytes


In [None]:
df['City'].unique() # Gets unique rows from a column

array(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
      dtype=object)

In [None]:
df['City'].nunique() # Gets unique rows count from a column

5

In [None]:
df['City'].value_counts() # Gets rows counts from all column values

Unnamed: 0_level_0,count
City,Unnamed: 1_level_1
New York,1
Los Angeles,1
Chicago,1
Houston,1
Phoenix,1


In [None]:
df['Hire Date'].head() # Single Col

Unnamed: 0,Hire Date
0,2020-05-01
1,2019-07-15
2,2021-01-20
3,2018-03-30
4,2022-10-10


In [None]:
df[['Hire Date', 'Salary']].head() # Multiple Col

Unnamed: 0,Hire Date,Salary
0,2020-05-01,70000
1,2019-07-15,80000
2,2021-01-20,60000
3,2018-03-30,90000
4,2022-10-10,75000


## Slicing

In [None]:
df[2:4] # Slicing  start:end:step (Avoid this and use loc and iloc for large dataset)
df[['Name', 'Age']][2:4] # Slicing with selected cols  start:end:step

Unnamed: 0,Name,Age
2,Charlie,22
3,David,32


In [None]:
# loc - Slice is by Label inside dataset
df.loc[0:2, ['Name', 'Age']]

Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27
2,Charlie,22


In [None]:
# iloc - Slice is by actual programming index - exclusive
df[['Name', 'Age']].iloc[0:2]

Unnamed: 0,Name,Age
0,Alice,24
1,Bob,27


## Filtering

In [None]:
filter_df = df[df['Salary'] >= 90000]
filter_df1 = df[(df['Salary'] >= 72000) & (df['City'] == 'Phoenix')]
display(filter_df)
display(filter_df1)

Unnamed: 0,Name,Age,City,Salary,Hire Date
3,David,32,Houston,90000,2018-03-30


Unnamed: 0,Name,Age,City,Salary,Hire Date
4,Eva,29,Phoenix,75000,2022-10-10


## Operations

In [None]:
nameandage = df['Name'] + df['City']
nameandage

Unnamed: 0,0
0,AliceNew York
1,BobLos Angeles
2,CharlieChicago
3,DavidHouston
4,EvaPhoenix


In [None]:
agemultipliedby2 = df['Age'] * 2
agemultipliedby2

Unnamed: 0,Age
0,48
1,54
2,44
3,64
4,58


In [None]:
# Function Apply
def times2(value):
  return value * 2

agetimes2 = df['Age'].apply(times2)
agetimes2

Unnamed: 0,Age
0,48
1,54
2,44
3,64
4,58


In [None]:
# 8. Apply a custom function (e.g., categorize Age)
df['Age Group'] = df['Age'].apply(lambda x: 'Young' if x < 30 else 'Mature')
df

Unnamed: 0,Name,Age,City,Salary,Hire Date,Age Group
0,Alice,24,New York,70000,2020-05-01,Young
1,Bob,27,Los Angeles,80000,2019-07-15,Young
2,Charlie,22,Chicago,60000,2021-01-20,Young
3,David,32,Houston,90000,2018-03-30,Mature
4,Eva,29,Phoenix,75000,2022-10-10,Young


In [None]:
# Rename
df_col_renamed = df.rename(columns={'Age': 'Age_Years'}) # inplace=True
display(df_col_renamed)

Unnamed: 0,Name,Age_Years,City,Salary,Hire Date,Age Group
0,Alice,24,New York,70000,2020-05-01,Young
1,Bob,27,Los Angeles,80000,2019-07-15,Young
2,Charlie,22,Chicago,60000,2021-01-20,Young
3,David,32,Houston,90000,2018-03-30,Mature
4,Eva,29,Phoenix,75000,2022-10-10,Young


In [None]:
df_drop = df.drop(labels=['Salary'], axis=1)
df_drop

Unnamed: 0,Name,Age,City,Hire Date,Age Group
0,Alice,24,New York,2020-05-01,Young
1,Bob,27,Los Angeles,2019-07-15,Young
2,Charlie,22,Chicago,2021-01-20,Young
3,David,32,Houston,2018-03-30,Mature
4,Eva,29,Phoenix,2022-10-10,Young


In [None]:
df_sort = df.sort_values('Salary', ascending=True)
df_sort

Unnamed: 0,Name,Age,City,Salary,Hire Date,Age Group
2,Charlie,22,Chicago,60000,2021-01-20,Young
0,Alice,24,New York,70000,2020-05-01,Young
4,Eva,29,Phoenix,75000,2022-10-10,Young
1,Bob,27,Los Angeles,80000,2019-07-15,Young
3,David,32,Houston,90000,2018-03-30,Mature


In [None]:
# Pivot DataFrame (e.g., pivot by Name and Salary)
# The pivot function reshapes the DataFrame, turning one of the columns (in this case, 'Location')
# into new columns, while using another column (in this case, 'Name') as the row index.
# The 'values' parameter specifies which column's values to fill into the new pivot table ('Salary').
# The result have 'Name' as index and the unique values from the 'Location' column as new columns,
# with corresponding 'Salary' values filled in.
pivot_df = df.pivot(index='Name', columns='City', values='Salary')
pivot_df

City     Chicago  Houston  Los Angeles  New York  Phoenix
Name                                                     
Alice        NaN      NaN          NaN   70000.0      NaN
Bob          NaN      NaN      80000.0       NaN      NaN
Charlie  60000.0      NaN          NaN       NaN      NaN
David        NaN  90000.0          NaN       NaN      NaN
Eva          NaN      NaN          NaN       NaN  75000.0


In [None]:
# Group by a column and aggregate (e.g., average Salary by City)
grouped_df = df.groupby('City')['Salary'].mean()
grouped_df

Unnamed: 0_level_0,Salary
City,Unnamed: 1_level_1
Chicago,60000.0
Houston,90000.0
Los Angeles,80000.0
New York,70000.0
Phoenix,75000.0


## Join, Merge, Concat

In [None]:
# Merge two DataFrames (example usage with a dummy df2)
df_table_1 = df[['Name', 'Age']]
df_table_2 = df[['Name', 'City', 'Salary', 'Hire Date']]
merged_df = pd.merge(df_table_1, df_table_2, on='Name', how='left')
merged_df

Unnamed: 0,Name,Age,City,Salary,Hire Date
0,Alice,24,New York,70000,2020-05-01
1,Bob,27,Los Angeles,80000,2019-07-15
2,Charlie,22,Chicago,60000,2021-01-20
3,David,32,Houston,90000,2018-03-30
4,Eva,29,Phoenix,75000,2022-10-10


In [None]:
# Join
df_join = df_table_1.join(df_table_2, how='inner', lsuffix='_left', rsuffix='_right')
df_join

Unnamed: 0,Name_left,Age,Name_right,City,Salary,Hire Date
0,Alice,24,Alice,New York,70000,2020-05-01
1,Bob,27,Bob,Los Angeles,80000,2019-07-15
2,Charlie,22,Charlie,Chicago,60000,2021-01-20
3,David,32,David,Houston,90000,2018-03-30
4,Eva,29,Eva,Phoenix,75000,2022-10-10


In [None]:
# Concat
df_concat = pd.concat([df_table_1, df_table_2], axis=1)
df_concat

Unnamed: 0,Name,Age,Name.1,City,Salary,Hire Date
0,Alice,24,Alice,New York,70000,2020-05-01
1,Bob,27,Bob,Los Angeles,80000,2019-07-15
2,Charlie,22,Charlie,Chicago,60000,2021-01-20
3,David,32,David,Houston,90000,2018-03-30
4,Eva,29,Eva,Phoenix,75000,2022-10-10


In [None]:
# Handle missing data (e.g., fill missing values with a default)
df['Bonus'] = [5000, None, 4000, None, 4500]
df['Bonus'].fillna(0, inplace=True)  # Fill missing Bonus with 0
df

Unnamed: 0,Name,Age,City,Salary,Hire Date,Age Group,Bonus,Hire Year
0,Alice,24,New York,70000,2020-05-01,Young,5000.0,2020
1,Bob,27,Los Angeles,80000,2019-07-15,Young,0.0,2019
2,Charlie,22,Chicago,60000,2021-01-20,Young,4000.0,2021
3,David,32,Houston,90000,2018-03-30,Mature,0.0,2018
4,Eva,29,Phoenix,75000,2022-10-10,Young,4500.0,2022


In [None]:
# Convert a column's data type (e.g., converting Hire Date to Year)
df['Hire Year'] = pd.DatetimeIndex(df['Hire Date']).year
df

Unnamed: 0,Name,Age,City,Salary,Hire Date,Age Group,Bonus,Hire Year
0,Alice,24,New York,70000,2020-05-01,Young,5000.0,2020
1,Bob,27,Los Angeles,80000,2019-07-15,Young,0.0,2019
2,Charlie,22,Chicago,60000,2021-01-20,Young,4000.0,2021
3,David,32,Houston,90000,2018-03-30,Mature,0.0,2018
4,Eva,29,Phoenix,75000,2022-10-10,Young,4500.0,2022


In [None]:
# Remove duplicate rows (if any)
df.drop_duplicates(inplace=True)
df

Unnamed: 0,Name,Age,City,Salary,Hire Date,Age Group,Bonus,Hire Year
0,Alice,24,New York,70000,2020-05-01,Young,5000.0,2020
1,Bob,27,Los Angeles,80000,2019-07-15,Young,0.0,2019
2,Charlie,22,Chicago,60000,2021-01-20,Young,4000.0,2021
3,David,32,Houston,90000,2018-03-30,Mature,0.0,2018
4,Eva,29,Phoenix,75000,2022-10-10,Young,4500.0,2022


## Working with Date Columns

In [None]:
df_dates = pd.DataFrame({'date':['3/10/2000', '3/11/2000', '3/12/2000'], 'value': [2,3,4]})
df_dates

Unnamed: 0,date,value
0,3/10/2000,2
1,3/11/2000,3
2,3/12/2000,4


In [None]:
df_dates.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   date    3 non-null      object
 1   value   3 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes


In [None]:
 # convert to datetime and handling error (ignore, coerce)
df_dates['date'] = pd.to_datetime(df_dates['date'], errors='ignore') #  format="%Y-%d-%m",
df_dates.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   date    3 non-null      datetime64[ns]
 1   value   3 non-null      int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 176.0 bytes


In [None]:
df_dates['year'] = df_dates['date'].dt.year
df_dates['month'] = df_dates['date'].dt.month
df_dates['day'] = df_dates['date'].dt.day
df_dates['day_of_week'] = df_dates['date'].dt.dayofweek
df_dates['is_leap_year'] = df_dates['date'].dt.is_leap_year
df_dates

Unnamed: 0,date,value,year,month,day,day_of_week,is_leap_year
0,2000-03-10,2,2000,3,10,4,True
1,2000-03-11,3,2000,3,11,5,True
2,2000-03-12,4,2000,3,12,6,True


In [None]:
dw_mapping = {
    0: 'Monday',
    1: 'Tuesday',
    2: 'Wednesday',
    3: 'Thursday',
    4: 'Friday',
    5: 'Saturday',
    6: 'Sunday'
}
df_dates['day_of_the_week_name'] = df_dates['date'].dt.weekday.map(dw_mapping)
df_dates

Unnamed: 0,date,value,year,month,day,day_of_week,is_leap_year,day_of_the_week_name
0,2000-03-10,2,2000,3,10,4,True,Friday
1,2000-03-11,3,2000,3,11,5,True,Saturday
2,2000-03-12,4,2000,3,12,6,True,Sunday


In [None]:
# todays date
pd.to_datetime('today').year - df_dates['date'].dt.year

Unnamed: 0,date
0,24
1,24
2,24


In [None]:
# date as index for faster lookups
df_dates = df_dates.set_index(['date'])
df

Unnamed: 0,Name,Age,City,Salary,Hire Date,Age Group,Bonus,Hire Year,day_of_the_week_name
0,Alice,24,New York,70000,2020-05-01,Young,5000.0,2020,Friday
1,Bob,27,Los Angeles,80000,2019-07-15,Young,0.0,2019,Saturday
2,Charlie,22,Chicago,60000,2021-01-20,Young,4000.0,2021,Sunday
3,David,32,Houston,90000,2018-03-30,Mature,0.0,2018,
4,Eva,29,Phoenix,75000,2022-10-10,Young,4500.0,2022,


In [None]:
df_dates.between_time('1:2', '2:3')

Unnamed: 0_level_0,value,year,month,day,day_of_week,is_leap_year,day_of_the_week_name
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1


## Different File Formats

### JSON: JavaScript Object Notation - Monstly used in web and api's

In [None]:
import json

In [None]:
data = {
    "india": {
        "pm": "Modi",
        "cm": "Pawan Kalyan"
    }
}

In [None]:
# json.dump
with open("datafile.json", "w") as write_file:
  json.dump(data, write_file)

In [None]:
# json.load
with open("datafile.json", "r") as read_file:
  data = json.load(read_file)

In [None]:
print(type(data))
data

<class 'dict'>


{'india': {'pm': 'Modi', 'cm': 'Pawan Kalyan'}}

In [None]:
# df_json = pd.read_json(data, orient='index')

In [None]:
# JSON to DataFrame using from_dict
df_json = pd.DataFrame.from_dict(data, orient='index') # data['india']
df_json

Unnamed: 0,pm,cm
india,Modi,Pawan Kalyan


### CSV: Comma Separated Values File - Tabular Data

In [None]:
# read_csv
# df_csv = pd.read_csv(data/file.csv, index_col="Name")

### Excel Files

In [None]:
pip install openpyxl



In [None]:
from openpyxl import Workbook

workbook = Workbook()
sheet = workbook.active
sheet['A1'] = 'hello'
sheet['B1'] = 'world'
workbook.save(filename='hello.xlsx')

In [None]:
from openpyxl import load_workbook

workbook = load_workbook(filename='hello.xlsx')
workbook.sheetnames
['Sheet 1']
sheet = workbook.active
sheet

<Worksheet "Sheet">

In [None]:
sheet

<Worksheet "Sheet">

In [None]:
sheet.title

'Sheet'

In [None]:
sheet.cell(row=1, column=1).value

'hello'

In [None]:
df_excel = pd.read_excel('hello.xlsx')
df_excel

Unnamed: 0,hello,world


### Apache AVRO: data serialization format. It is a row based format suitable for evolving data schemas. The metadata travels with the data. If you have the .avro file you have the schema as well (excel and csv cannot contain the schem, they just have the data).

In [None]:
pip install avro-python3

Collecting avro-python3
  Downloading avro-python3-1.10.2.tar.gz (38 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: avro-python3
  Building wheel for avro-python3 (setup.py) ... [?25l[?25hdone
  Created wheel for avro-python3: filename=avro_python3-1.10.2-py3-none-any.whl size=43994 sha256=670d488f67df851106646b94ffdcb575e7de418707cab290f1c8bf02025da6cf
  Stored in directory: /root/.cache/pip/wheels/bc/85/62/6cdd81c56f923946b401cecff38055b94c9b766927f7d8ca82
Successfully built avro-python3
Installing collected packages: avro-python3
Successfully installed avro-python3-1.10.2


In [None]:
import copy
import json
import avro
from avro.datafile import DataFileWriter, DataFileReader
from avro.io import DatumWriter, DatumReader

In [None]:
schema = {
    "name": 'avro.example.User',
    "type": "record",
    "fields": [
        {'name': 'name', 'type': 'string'},
        {'name': 'age', 'type': 'int'}
    ]
    }
schema_parsed = avro.schema.Parse(json.dumps(schema))
schema_parsed

<avro.schema.RecordSchema at 0x7b6c50c76110>

In [None]:
with open('users.avro', 'wb') as f:
  writer = DataFileWriter(f, DatumWriter(), schema_parsed)
  writer.append({'name': 'Vinay', 'age':25})
  writer.close()

In [None]:
with open('users.avro', 'rb') as f:
  reader = DataFileReader(f, DatumReader())
  metadata = copy.deepcopy(reader.meta)
  schema_from_file = json.loads(metadata['avro.schema'])
  users = [user for user in reader]
  reader.close()

print(f'schema that we specified:\n {schema}')
print(f'schema that we parsed:\n {schema_parsed}')
print(f'schema from users.avro file:\n {schema_from_file}')
print(f'Users:\n {users}')

schema that we specified:
 {'name': 'avro.example.User', 'type': 'record', 'fields': [{'name': 'name', 'type': 'string'}, {'name': 'age', 'type': 'int'}]}
schema that we parsed:
 {"type": "record", "name": "User", "namespace": "avro.example", "fields": [{"type": "string", "name": "name"}, {"type": "int", "name": "age"}]}
schema from users.avro file:
 {'type': 'record', 'name': 'User', 'namespace': 'avro.example', 'fields': [{'type': 'string', 'name': 'name'}, {'type': 'int', 'name': 'age'}]}
Users:
 [{'name': 'Vinay', 'age': 25}]


In [None]:
# Using pandas
!pip install pandavro

Collecting pandavro
  Downloading pandavro-1.8.0-py3-none-any.whl.metadata (8.5 kB)
Collecting fastavro<2.0.0,>=1.5.1 (from pandavro)
  Downloading fastavro-1.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Downloading pandavro-1.8.0-py3-none-any.whl (8.8 kB)
Downloading fastavro-1.9.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fastavro, pandavro
Successfully installed fastavro-1.9.7 pandavro-1.8.0


In [None]:
import pandavro as pdx

In [None]:
users = [{'name': 'Vinay', 'age': '25'},
        {'name': 'Chandana', 'age': '26'}]
users_df = pd.DataFrame.from_records(users)
print(users_df)

       name age
0     Vinay  25
1  Chandana  26


In [None]:
pdx.to_avro('user_test.avro', users_df)

In [None]:
users_df_redux = pdx.from_avro('user_test.avro')
print(type(users_df_redux))

<class 'pandas.core.frame.DataFrame'>


In [None]:
with open('user_test.avro', 'rb') as f:
  reader = DataFileReader(f, DatumReader())
  metadata = copy.deepcopy(reader.meta)
  schema_from_file = json.loads(metadata['avro.schema'])
  reader.close()
print(schema_from_file)

{'type': 'record', 'name': 'Root', 'fields': [{'name': 'name', 'type': ['null', 'string']}, {'name': 'age', 'type': ['null', 'string']}]}
