### **This course provides a comprehensive guide to setting up Python virtual environments with Miniconda, working with data structures, manipulating Pandas DataFrames, applying algorithmic principles, utilizing regular expressions, and efficiently handling file operations.**

# Installing and Using Miniconda

## Table of Contents
1. [Introduction to Miniconda](#1-introduction-to-miniconda)
2. [Installing Miniconda](#2-installing-miniconda)
3. [Creating and Managing Conda Environments](#3-creating-and-managing-conda-environments)
4. [Installing and Managing Packages](#4-installing-and-managing-packages)
5. [Working with Jupyter Notebook in a Conda Environment](#5-working-with-jupyter-notebook-in-a-conda-environment)
6. [Updating and Uninstalling Miniconda](#6-updating-and-uninstalling-miniconda)
7. [Exercises](#7-exercises)

## 1. Introduction to Miniconda
Miniconda is a minimalistic distribution of Anaconda that includes only `conda`, Python, and essential packages. It allows users to create isolated environments and manage dependencies efficiently.

## 2. Installing Miniconda
### **Windows Installation**
1. Download Miniconda from [Miniconda Official Website](https://docs.conda.io/en/latest/miniconda.html)
2. Run the installer and follow on-screen instructions.
3. Select "Add Miniconda to PATH" (optional but recommended).
4. Open the command prompt (`cmd`) and verify installation:
```bash
conda --version
```

### **Linux/Mac Installation**
1. Download the latest Miniconda installer:
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
```
2. Run the installer:
```bash
bash Miniconda3-latest-Linux-x86_64.sh
```
3. Follow the instructions, restart the shell, and verify:
```bash
conda --version
```

## 3. Creating and Managing Conda Environments
### **Create a new environment**
```bash
conda create --name my_env python=3.9
```
### **Activate the environment**
```bash
conda activate my_env
```
### **Deactivate the environment**
```bash
conda deactivate
```
### **List all environments**
```bash
conda env list
```
### **Remove an environment**
```bash
conda remove --name my_env --all
```

## 4. Installing and Managing Packages
### **Install a package**
```bash
conda install numpy pandas
```
### **Uninstall a package**
```bash
conda remove numpy
```
### **Check installed packages**
```bash
conda list
```
### **Update a package**
```bash
conda update pandas
```
### **Install packages from specific channels**
```bash
conda install -c conda-forge matplotlib
```

## 5. Working with Jupyter Notebook in a Conda Environment
### **Install Jupyter Notebook in a Conda Environment**
```bash
conda install -c conda-forge jupyterlab
```
### **Run Jupyter Notebook**
```bash
jupyter notebook
```
### **Add the Environment to Jupyter Kernel**
```bash
conda install ipykernel
python -m ipykernel install --user --name=my_env --display-name "Python (my_env)"
```

## 6. Updating and Uninstalling Miniconda
### **Update Conda**
```bash
conda update conda
```
### **Uninstall Miniconda (Linux/Mac)**
```bash
rm -rf ~/miniconda3
```
### **Uninstall Miniconda (Windows)**
1. Open "Add or Remove Programs" and uninstall Miniconda.
2. Delete the `C:\Users\YourUsername\miniconda3` folder manually.

## 7. 
### **1: Install and Verify Miniconda**
- Install Miniconda on your system and verify the installation with `conda --version`.

### **2: Create and Manage Environments**
- Create an environment named `data_env` with Python 3.9.
- Install `numpy` and `pandas` in this environment.
- Deactivate and then remove the environment.

### **3: Work with Jupyter in Conda**
- Create an environment with Jupyter installed.
- Open Jupyter Notebook and verify that your environment is listed.

### **4: Install and Update Packages**
- Install `scipy` and `matplotlib` in a new environment.
- Update `matplotlib` to the latest version.
- List all installed packages.


# Python Data Structures Course

### Python Data Structures Course

## Table of Contents
1. [Python Lists](#1-python-lists)
2. [Character Strings](#2-character-strings)
3. [Lists of Constants (Tuples)](#3-lists-of-constants-tuples)
4. [Python Dictionaries](#4-python-dictionaries)
5. [Sets](#5-sets)
6. [Arrays (2D and n-dimensional)](#6-arrays-with-numpy)
7. [DataFrames with Pandas](#7-dataframes-with-pandas)
8. [Functions](#8-functions)
9. [Object-Oriented Programming (OOP)](#9-object-oriented-programming)
10. [Lambda Functions](#10-lambda-functions)
11. [Filter & Reduce](#11-filter-and-reduce)
12. [Listing and Deleting Existing Objects](#12-listing-and-deleting-existing-objects)
13. [Exercises](#13-exercises)

## 1. Python Lists
Lists are one of the most commonly used data structures in Python. They are ordered, mutable, and allow duplicate values.

In [100]:
my_list = [1, 2, 3, 4, 5]
print("List:", my_list)
my_list.append(6)
print("After appending 6:", my_list)
my_list.remove(2)
print("After removing 2:", my_list)

List: [1, 2, 3, 4, 5]
After appending 6: [1, 2, 3, 4, 5, 6]
After removing 2: [1, 3, 4, 5, 6]


## 2. Character Strings
Strings in Python are immutable sequences of characters. They support slicing and many built-in methods.

In [101]:
my_string = "Hello, Python!"
print("String:", my_string)
print("Length:", len(my_string))
print("Uppercase:", my_string.upper())
print("Substring:", my_string[7:13])

String: Hello, Python!
Length: 14
Uppercase: HELLO, PYTHON!
Substring: Python


## 3. Lists of Constants (Tuples)
Tuples are immutable sequences, meaning their elements cannot be changed after creation. They are often used to store fixed collections of data.

In [102]:
my_tuple = (10, 20, 30, 40, 50)
print("Tuple:", my_tuple)
print("First element:", my_tuple[0])

Tuple: (10, 20, 30, 40, 50)
First element: 10


## 4. Python Dictionaries
Dictionaries store key-value pairs, providing a fast and flexible way to store and retrieve data.

In [103]:
my_dict = {"name": "Alice", "age": 25, "city": "Paris"}
print("Dictionary:", my_dict)
my_dict["age"] = 26  # Modifying a value
print("Updated Dictionary:", my_dict)
print("Keys:", my_dict.keys())
print("Values:", my_dict.values())

Dictionary: {'name': 'Alice', 'age': 25, 'city': 'Paris'}
Updated Dictionary: {'name': 'Alice', 'age': 26, 'city': 'Paris'}
Keys: dict_keys(['name', 'age', 'city'])
Values: dict_values(['Alice', 26, 'Paris'])


## 5. Sets
Sets are unordered collections of unique elements. They support operations like union, intersection, and difference.

In [104]:
my_set = {1, 2, 3, 4, 5}
my_set.add(6)
print("Set after adding 6:", my_set)
my_set.remove(3)
print("Set after removing 3:", my_set)
print("Union:", my_set | {5, 6, 7, 8})
print("Intersection:", my_set & {3, 4, 5})

Set after adding 6: {1, 2, 3, 4, 5, 6}
Set after removing 3: {1, 2, 4, 5, 6}
Union: {1, 2, 4, 5, 6, 7, 8}
Intersection: {4, 5}


## 6. Arrays with NumPy
NumPy arrays allow fast mathematical operations and support multi-dimensional structures.

In [105]:
import numpy as np
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", array_2d)
array_nd = np.random.rand(2, 3, 4)  # 3D array example
print("3D Array:\n", array_nd)

2D Array:
 [[1 2 3]
 [4 5 6]]
3D Array:
 [[[0.95645728 0.73809443 0.91178107 0.09655325]
  [0.47323105 0.33224491 0.93292007 0.55061886]
  [0.59379559 0.32977716 0.07503536 0.68739107]]

 [[0.52200763 0.74939341 0.74618466 0.73849096]
  [0.3340667  0.33005171 0.31508431 0.18747933]
  [0.65134207 0.30787661 0.37444515 0.13117113]]]


## 7. DataFrames with Pandas
Pandas DataFrames provide a powerful way to handle tabular data.

In [106]:
import pandas as pd
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35], "City": ["Paris", "Berlin", "New York"]}
df = pd.DataFrame(data)
print("DataFrame:\n", df)

DataFrame:
       Name  Age      City
0    Alice   25     Paris
1      Bob   30    Berlin
2  Charlie   35  New York


## 8. Functions
Functions allow code reusability and modularization.

In [107]:
def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))

Hello, Alice!


## 9. Object-Oriented Programming (OOP)
Python supports OOP concepts such as classes and objects.

In [108]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def introduce(self):
        return f"Hi, I'm {self.name} and I'm {self.age} years old."

person1 = Person("Alice", 25)
print(person1.introduce())

Hi, I'm Alice and I'm 25 years old.


## 10. Lambda Functions
Lambda functions are anonymous, inline functions.

In [109]:
square = lambda x: x ** 2
print("Square of 5:", square(5))

Square of 5: 25


## 11. Filter and Reduce
`filter()` and `reduce()` are used for functional-style processing of collections.

In [110]:
from functools import reduce

numbers = [1, 2, 3, 4, 5]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print("Even numbers:", even_numbers)

sum_numbers = reduce(lambda x, y: x + y, numbers)
print("Sum of all numbers:", sum_numbers)

Even numbers: [2, 4]
Sum of all numbers: 15


## 12. Listing and Deleting Existing Objects
We can list all objects in memory using `dir()` and delete specific objects using `del`.

In [111]:
print("Existing Objects:", dir())
del my_list
print("After deleting my_list:", dir())

Existing Objects: ['In', 'Out', 'Person', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i10', '_i100', '_i101', '_i102', '_i103', '_i104', '_i105', '_i106', '_i107', '_i108', '_i109', '_i11', '_i110', '_i111', '_i12', '_i13', '_i14', '_i15', '_i16', '_i17', '_i18', '_i19', '_i2', '_i20', '_i21', '_i22', '_i23', '_i24', '_i25', '_i26', '_i27', '_i28', '_i29', '_i3', '_i30', '_i31', '_i32', '_i33', '_i34', '_i35', '_i36', '_i37', '_i38', '_i39', '_i4', '_i40', '_i41', '_i42', '_i43', '_i44', '_i45', '_i46', '_i47', '_i48', '_i49', '_i5', '_i50', '_i51', '_i52', '_i53', '_i54', '_i55', '_i56', '_i57', '_i58', '_i59', '_i6', '_i60', '_i61', '_i62', '_i63', '_i64', '_i65', '_i66', '_i67', '_i68', '_i69', '_i7', '_i70', '_i71', '_i72', '_i73', '_i74', '_i75', '_i76', '_i77', '_i78', '_i79', '_i8', '_i80', '_i81', '_i82', '_i83', '_i84', '_i85', '_i86', '_i87', '_i88', '_i89', '_i9', '_i90', '_i91', '_i92

## 13. Exercises
### Exercise 6: Sets Operations
- Create two sets of numbers and perform union and intersection.
- Remove an element from a set and print the updated set.

### Exercise 7: Functions
- Write a function that takes two numbers and returns their sum.
- Modify the function to return both the sum and product of the numbers.

### Exercise 8: OOP
- Create a `Car` class with `brand` and `speed` attributes.
- Add a method that prints a statement about the car's speed.

### Exercise 9: Lambda Functions
- Write a lambda function that multiplies two numbers.
- Use `map()` to apply a lambda function to a list of numbers.

### Exercise 10: Filter and Reduce
- Use `filter()` to extract odd numbers from a list.
- Use `reduce()` to compute the product of all elements in a list.

# Advanced Pandas DataFrame Course

## Table of Contents
1. [Introduction to Pandas](#1-introduction-to-pandas)
2. [Creating DataFrames](#2-creating-dataframes)
3. [Indexing and Selecting Data](#3-indexing-and-selecting-data)
4. [Data Cleaning and Handling Missing Values](#4-data-cleaning-and-handling-missing-values)
5. [Data Transformation](#5-data-transformation)
6. [Aggregation and Grouping](#6-aggregation-and-grouping)
7. [Merging and Joining DataFrames](#7-merging-and-joining-dataframes)
8. [Working with Time Series Data](#8-working-with-time-series-data)
9. [Performance Optimization](#9-performance-optimization)
10. [Exercises](#10-exercises)

## 1. Introduction to Pandas
Pandas is a powerful Python library for data manipulation and analysis. It provides DataFrame and Series structures that allow efficient handling of structured data.

In [112]:
import pandas as pd
print("Pandas version:", pd.__version__)

Pandas version: 2.2.2


## 2. Creating DataFrames
A DataFrame is a 2D labeled data structure, similar to a spreadsheet.

In [113]:
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "London", "Paris"]
}
df = pd.DataFrame(data)
print(df)

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris


## 3. Indexing and Selecting Data
### Selecting Columns

In [114]:
print(df["Name"])

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object


### Selecting Rows

In [115]:
print(df.loc[1])
print(df.iloc[0])

Name       Bob
Age         30
City    London
Name: 1, dtype: object
Name       Alice
Age           25
City    New York
Name: 0, dtype: object


## 4. Data Cleaning and Handling Missing Values
### Handling Missing Data

In [116]:
df["Salary"] = [50000, None, 70000]
print(df.isnull())

    Name    Age   City  Salary
0  False  False  False   False
1  False  False  False    True
2  False  False  False   False


### Filling Missing Values

In [117]:
df.fillna(df["Salary"].mean(), inplace=True)
print(df)

      Name  Age      City   Salary
0    Alice   25  New York  50000.0
1      Bob   30    London  60000.0
2  Charlie   35     Paris  70000.0


## 5. Data Transformation
### Applying Functions

In [118]:
df["Age in 10 Years"] = df["Age"].apply(lambda x: x + 10)
print(df)

      Name  Age      City   Salary  Age in 10 Years
0    Alice   25  New York  50000.0               35
1      Bob   30    London  60000.0               40
2  Charlie   35     Paris  70000.0               45


## 6. Aggregation and Grouping
### Grouping Data

In [119]:
data = {
    "Department": ["IT", "HR", "IT", "HR", "Finance"],
    "Salary": [70000, 60000, 75000, 65000, 80000]
}
df = pd.DataFrame(data)
print(df.groupby("Department")["Salary"].mean())

Department
Finance    80000.0
HR         62500.0
IT         72500.0
Name: Salary, dtype: float64


## 7. Merging and Joining DataFrames
### Merging Two DataFrames

In [120]:
df1 = pd.DataFrame({"ID": [1, 2, 3], "Name": ["Alice", "Bob", "Charlie"]})
df2 = pd.DataFrame({"ID": [1, 2, 4], "Salary": [50000, 60000, 70000]})
merged_df = pd.merge(df1, df2, on="ID", how="left")
print(merged_df)

   ID     Name   Salary
0   1    Alice  50000.0
1   2      Bob  60000.0
2   3  Charlie      NaN


## 8. Working with Time Series Data
### Creating a Time Series

In [121]:
df["Date"] = pd.date_range(start="2023-01-01", periods=len(df), freq="D")
print(df)

  Department  Salary       Date
0         IT   70000 2023-01-01
1         HR   60000 2023-01-02
2         IT   75000 2023-01-03
3         HR   65000 2023-01-04
4    Finance   80000 2023-01-05


## 9. Performance Optimization
### Using Efficient Data Types

In [122]:
df["Salary"] = df["Salary"].astype("int32")
print(df.dtypes)

Department            object
Salary                 int32
Date          datetime64[ns]
dtype: object


## 10. Exercises
### Exercise 1: Data Cleaning
- Load a dataset with missing values and clean it.
- Fill missing values using median instead of mean.

### Exercise 2: Data Transformation
- Create a new column with a transformed version of an existing column.
- Apply a custom function to each row.

### Exercise 3: Aggregation and Grouping
- Group data by a categorical column and compute statistics.
- Find the highest value in each group.

### Exercise 4: Merging DataFrames
- Merge two datasets on a common key using different join types.
- Handle missing values after merging.

# Advanced Python Algorithms Course

## Table of Contents
1. [Sorting Algorithms](#1-sorting-algorithms)
2. [Graph Algorithms](#2-graph-algorithms)
3. [Dynamic Programming](#3-dynamic-programming)
4. [Backtracking](#4-backtracking)
5. [Greedy Algorithms](#5-greedy-algorithms)
6. [Divide and Conquer](#6-divide-and-conquer)
7. [Exercises](#7-exercises)

## 1. Sorting Algorithms
Sorting algorithms are used to arrange elements in a specific order. Efficient sorting is crucial in programming, as it improves data retrieval and processing speed.

### Quick Sort
Quick Sort is a divide-and-conquer algorithm that works by selecting a 'pivot' and partitioning the elements around it.

In [123]:
def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

arr = [3, 6, 8, 10, 1, 2, 1]
print("Sorted array:", quick_sort(arr))

Sorted array: [1, 1, 2, 3, 6, 8, 10]


**Explanation:**
- A pivot is chosen (middle element).
- Elements are divided into those smaller, equal, and larger than the pivot.
- The process is recursively applied to the left and right partitions.

## 2. Graph Algorithms
Graphs represent relationships between objects and are widely used in networking, AI, and route planning.

### Breadth-First Search (BFS)
BFS explores all neighbors at the present depth before moving deeper.

In [124]:
from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])

    while queue:
        vertex = queue.popleft()
        if vertex not in visited:
            print(vertex, end=" ")
            visited.add(vertex)
            queue.extend(graph[vertex] - visited)

graph = {
    'A': {'B', 'C'},
    'B': {'A', 'D', 'E'},
    'C': {'A', 'F'},
    'D': {'B'},
    'E': {'B', 'F'},
    'F': {'C', 'E'}
}

bfs(graph, 'A')

A B C D E F 

**Explanation:**
- We maintain a queue and a set of visited nodes.
- Nodes are processed in layers, ensuring the shortest path is found.

## 3. Dynamic Programming
Dynamic programming (DP) optimizes recursive problems by storing previous results.

### Fibonacci Sequence using Memoization

In [125]:
def fibonacci(n, memo={}):
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = fibonacci(n - 1, memo) + fibonacci(n - 2, memo)
    return memo[n]

print("Fibonacci(10):", fibonacci(10))

Fibonacci(10): 55


**Explanation:**
- Memoization saves computed values to avoid redundant calculations.
- This reduces the time complexity from exponential to linear.

## 4. Backtracking
Backtracking systematically explores all possibilities to solve constraint-based problems.

### Solving the N-Queens Problem

def solve_n_queens(n):
    def is_safe(board, row, col):
        for i in range(row):
            if board[i] == col or abs(board[i] - col) == abs(i - row):
                return False
        return True

    def backtrack(row=0):
        if row == n:
            solutions.append(list(board))
            return
        for col in range(n):
            if is_safe(board, row, col):
                board[row] = col
                backtrack(row + 1)
                board[row] = -1

    solutions = []
    board = [-1] * n
    backtrack()
    return solutions

print("Solutions for 4-Queens:", solve_n_queens(4))

**Explanation:**
- The algorithm attempts to place queens row by row, checking constraints.
- If a conflict arises, it backtracks and tries a different placement.

## 5. Greedy Algorithms
Greedy algorithms make locally optimal choices at each step.

### Activity Selection Problem

In [126]:
def activity_selection(activities):
    activities.sort(key=lambda x: x[1])  # Sort by finish time
    selected = [activities[0]]
    last_end = activities[0][1]

    for i in range(1, len(activities)):
        if activities[i][0] >= last_end:
            selected.append(activities[i])
            last_end = activities[i][1]

    return selected

activities = [(1, 3), (2, 5), (3, 9), (6, 8), (5, 7)]
print("Selected activities:", activity_selection(activities))

Selected activities: [(1, 3), (5, 7)]


**Explanation:**
- Activities are sorted by finishing time.
- The earliest finishing activity is always chosen first.

## 6. Divide and Conquer
Divide and conquer breaks problems into subproblems and combines results.

### Merge Sort

In [127]:
def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

def merge(left, right):
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result.extend(left[i:])
    result.extend(right[j:])
    return result

arr = [10, 3, 15, 7, 8, 23, 74, 18]
print("Sorted array:", merge_sort(arr))

Sorted array: [3, 7, 8, 10, 15, 18, 23, 74]


**Explanation:**
- The list is recursively split in half.
- Each half is sorted and then merged efficiently.

These explanations and examples deepen understanding and demonstrate algorithmic efficiency.

# Advanced Regular Expressions in Python

## Table of Contents
1. [Introduction to Regular Expressions](#1-introduction-to-regular-expressions)
2. [Basic Syntax and Metacharacters](#2-basic-syntax-and-metacharacters)
3. [Advanced Regular Expressions](#3-advanced-regular-expressions)
4. [Working with Groups and Captures](#4-working-with-groups-and-captures)
5. [Lookaheads and Lookbehinds](#5-lookaheads-and-lookbehinds)
6. [Performance Optimization](#6-performance-optimization)
7. [Practical Examples](#7-practical-examples)
8. [Exercises](#8-exercises)

## 1. Introduction to Regular Expressions
Regular expressions (regex) are sequences of characters defining search patterns. They are widely used for text processing.

In [128]:
import re
text = "Hello, my email is example@test.com"
pattern = r"\w+@\w+\.\w+"
match = re.search(pattern, text)
print("Found:", match.group() if match else "No match")

Found: example@test.com


## 2. Basic Syntax and Metacharacters
Metacharacters define patterns:
- `.` matches any character
- `^` matches the start of a string
- `$` matches the end of a string
- `*`, `+`, `{}` specify repetition

In [129]:
text = "Hello 123 world!"
pattern = r"\d+"
print(re.findall(pattern, text))  # Finds all numbers

['123']


## 3. Advanced Regular Expressions
### Using Character Classes

In [130]:
pattern = r"[A-Z][a-z]+"
text = "My name is Alice and I live in Paris"
print(re.findall(pattern, text))

['My', 'Alice', 'Paris']


### Matching Multiple Patterns

In [131]:
pattern = r"\b(cat|dog|fish)\b"
text = "I have a cat and a dog."
print(re.findall(pattern, text))

['cat', 'dog']


## 4. Working with Groups and Captures
Grouping allows extracting specific parts of a match.

In [132]:
pattern = r"(\d{3})-(\d{2})-(\d{4})"
text = "My SSN is 123-45-6789"
match = re.search(pattern, text)
if match:
    print("Area:", match.group(1))
    print("Group:", match.group(2))
    print("Serial:", match.group(3))

Area: 123
Group: 45
Serial: 6789


## 5. Lookaheads and Lookbehinds
Lookaheads and lookbehinds match patterns without including them in the result.

In [133]:
# Positive lookahead
pattern = r"\d+(?= dollars)"
text = "The price is 100 dollars"
print(re.findall(pattern, text))

['100']


In [134]:
# Negative lookbehind
pattern = r"(?<!Mr\.)\bSmith\b"
text = "Hello Smith, Mr. Smith is here."
print(re.findall(pattern, text))

['Smith', 'Smith']


## 6. Performance Optimization
### Using Compiled Regular Expressions

In [135]:
pattern = re.compile(r"\d{4}")
text = "1234 5678 9101"
print(pattern.findall(text))

['1234', '5678', '9101']


### Avoiding Catastrophic Backtracking

In [136]:
pattern = r"(a+)+b"
text = "aaaaaaaaaaaaaaaaac"
match = re.fullmatch(pattern, text)
print("Match found" if match else "No match")

No match


## 7. Practical Examples
### Extracting URLs from Text

In [137]:
pattern = r"https?://(?:www\.)?\w+\.\w+"
text = "Visit https://www.google.com and http://example.com"
print(re.findall(pattern, text))

['https://www.google.com', 'http://example.com']


### Validating Email Addresses

In [138]:
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email = "user@example.com"
print("Valid" if re.match(pattern, email) else "Invalid")

Valid


## 8. Exercises
### Exercise 1: Extract Dates
- Write a regex pattern to extract dates in `YYYY-MM-DD` format.
- Test it on sample text.

### Exercise 2: Phone Number Validation
- Write a regex to validate US phone numbers (format: `(123) 456-7890`).

### Exercise 3: Password Strength Checker
- Create a regex pattern that ensures a password contains at least:
  - 8 characters
  - One uppercase letter
  - One lowercase letter
  - One digit
  - One special character


### Exercise 4: Extract Hashtags
- Write a regex pattern to extract hashtags from social media posts.

# Advanced File I/O in Python

## Table of Contents
1. [Introduction to File I/O](#1-introduction-to-file-io)
2. [Reading and Writing Text Files](#2-reading-and-writing-text-files)
3. [Handling Different File Modes](#3-handling-different-file-modes)
4. [Working with Binary Files](#4-working-with-binary-files)
5. [Using File Context Managers](#5-using-file-context-managers)
6. [Processing Large Files Efficiently](#6-processing-large-files-efficiently)
7. [Working with JSON and CSV Files](#7-working-with-json-and-csv-files)
8. [Handling File Errors and Exceptions](#8-handling-file-errors-and-exceptions)
9. [Exercises](#9-exercises)

## 1. Introduction to File I/O
File I/O (Input/Output) allows Python to interact with external files for reading, writing, and processing data.

In [139]:
# Checking if a file exists
import os
print("File exists:", os.path.exists("sample.txt"))

File exists: True


## 2. Reading and Writing Text Files
### Writing to a File

In [140]:
with open("sample.txt", "w") as file:
    file.write("Hello, world!\n")
    file.write("This is a text file example.")

### Reading from a File

In [141]:
with open("sample.txt", "r") as file:
    content = file.read()
print(content)

Hello, world!
This is a text file example.


## 3. Handling Different File Modes
- **"r"**: Read mode
- **"w"**: Write mode (overwrites file)
- **"a"**: Append mode
- **"r+"**: Read and write

Example of appending data:

In [142]:
with open("sample.txt", "a") as file:
    file.write("\nAppending new content.")

## 4. Working with Binary Files
Binary files are used for non-text data such as images, audio, and compiled files.

In [143]:
import urllib.request

url = "https://pbs.twimg.com/media/B-buDdYIgAAYO9d.png"
file_name = "downloaded_image.png"

urllib.request.urlretrieve(url, file_name)
print("Image successfully downloaded as 'downloaded_image.png'")

Image successfully downloaded as 'downloaded_image.png'


In [144]:
# Writing a binary file
with open("/content/downloaded_image.png", "rb") as img:
    binary_data = img.read()
print("Binary data size:", len(binary_data))

Binary data size: 251059


## 5. Using File Context Managers
Using `with open()` ensures files are properly closed after processing.
Before, you can create a data.txt file with this content:

In [145]:
# Sample content
content = """Name, Age, City
Alice, 30, New York
Bob, 25, Los Angeles
Charlie, 35, Chicago"""

# Write to data.txt
with open("/content/data.txt", "w") as file:
    file.write(content)

print("data.txt has been created with sample content.")

data.txt has been created with sample content.


In [146]:
with open("/content/data.txt", "w") as file:
    file.write("Using context managers simplifies file handling.")

## 6. Processing Large Files Efficiently
### Reading Large Files Line by Line

In [147]:
with open("/content/data.txt", "r") as file:
    for line in file:
        print(line.strip())  # Process each line

Using context managers simplifies file handling.


In [148]:
def read_large_file(filename):
    with open(filename, "r") as file:
        for line in file:
            yield line.strip()

for line in read_large_file("/content/data.txt"):
    print(line)

Using context managers simplifies file handling.


### Using Generators for Efficient File Processing

## 7. Working with JSON and CSV Files
### JSON File Handling

In [149]:
import json

data = {"name": "Alice", "age": 25, "city": "New York"}
with open("/content/data.json", "w") as file:
    json.dump(data, file, indent=4)

### CSV File Handling

In [150]:
import csv

data = [["Name", "Age", "City"], ["Alice", 25, "New York"], ["Bob", 30, "London"]]
with open("data.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)

## 8. Handling File Errors and Exceptions
### Catching File Handling Errors

In [151]:
try:
    with open("non_existent.txt", "r") as file:
        content = file.read()
except FileNotFoundError:
    print("File not found!")

File not found!


## 9. Exercises
### Exercise 1: Read and Write Text Files
- Create a text file and write multiple lines.
- Read the file and display its content.

### Exercise 2: Process a Large File
- Read a large file line by line.
- Count occurrences of a specific word in the file.

### Exercise 3: Work with JSON
- Create a dictionary, convert it to JSON, and save it.
- Load the JSON file and print its contents.

### Exercise 4: Handle CSV Files
- Create a CSV file with tabular data.
- Read the CSV file into a Pandas DataFrame.




### Exercise 5: Exception Handling in File I/O
- Implement error handling for missing files.
- Handle permission errors when accessing a file.

# Summary: Python Comprehensive Exercise

## Problem Statement
You are given a dataset containing customer transactions, and your task is to:
- **Read and preprocess the dataset using Pandas**
- **Perform text extraction using regular expressions**
- **Implement an algorithm to detect fraud**
- **Store the results into a structured file**

## Steps to Solve the Problem

### 1. Load Data from a File
The dataset is stored in a CSV file named `transactions.csv`. It contains the following columns:
- `transaction_id`: Unique transaction identifier
- `customer_name`: Full name of the customer
- `email`: Email address
- `transaction_amount`: Amount spent
- `transaction_date`: Date of transaction
- `description`: Transaction details

In [154]:
import pandas as pd

# Load the dataset
df = pd.read_csv("/content/transactions.csv")
print(df.head())

   transaction_id  customer_name                email  transaction_amount  \
0            1001  Alice Johnson      alice@gmail.com                 200   
1            1002      Bob Smith        bob@yahoo.com                7500   
2            1003    Charlie Lee  charlie@hotmail.com                  50   
3            1004    David Brown    david@company.com                3000   
4            1005    Emma Wilson    emma@business.org                 600   

  transaction_date                   description  
0       2024-02-01  Purchase of electronic items  
1       2024-02-02     Refund for previous order  
2       2024-02-03             Coffee and snacks  
3       2024-02-04      Large furniture purchase  
4       2024-02-05            Chargeback dispute  


### 2. Extract Information Using Regular Expressions
We will extract:
- Domains from email addresses
- Keywords from the transaction description

In [155]:
import re

In [156]:
# Extract email domains
df["email_domain"] = df["email"].apply(lambda x: re.search(r"@(.+)", x).group(1))

# Extract keywords from transaction descriptions
df["keywords"] = df["description"].apply(lambda x: re.findall(r"\b[A-Za-z]+\b", x))
print(df.head())

   transaction_id  customer_name                email  transaction_amount  \
0            1001  Alice Johnson      alice@gmail.com                 200   
1            1002      Bob Smith        bob@yahoo.com                7500   
2            1003    Charlie Lee  charlie@hotmail.com                  50   
3            1004    David Brown    david@company.com                3000   
4            1005    Emma Wilson    emma@business.org                 600   

  transaction_date                   description  email_domain  \
0       2024-02-01  Purchase of electronic items     gmail.com   
1       2024-02-02     Refund for previous order     yahoo.com   
2       2024-02-03             Coffee and snacks   hotmail.com   
3       2024-02-04      Large furniture purchase   company.com   
4       2024-02-05            Chargeback dispute  business.org   

                            keywords  
0  [Purchase, of, electronic, items]  
1     [Refund, for, previous, order]  
2              [Coffee,

### 3. Implement Fraud Detection Algorithm
A transaction is considered **suspicious** if:
- The amount exceeds $5000
- The description contains the word "refund" or "chargeback"

In [157]:
def detect_fraud(row):
    if row["transaction_amount"] > 5000:
        return True
    if any(word in ["refund", "chargeback"] for word in row["keywords"]):
        return True
    return False

df["fraudulent"] = df.apply(detect_fraud, axis=1)
print(df[df["fraudulent"] == True])

   transaction_id customer_name          email  transaction_amount  \
1            1002     Bob Smith  bob@yahoo.com                7500   

  transaction_date                description email_domain  \
1       2024-02-02  Refund for previous order    yahoo.com   

                         keywords  fraudulent  
1  [Refund, for, previous, order]        True  


### 4. Save the Processed Data to a New File

In [158]:
# Save the cleaned and analyzed data
df.to_csv("processed_transactions.csv", index=False)
print("Processed data saved successfully.")

Processed data saved successfully.


## Exercises

### Exercise 1: Identify Suspicious Email Domains
- Find the top 5 most frequent email domains in fraudulent transactions.
- Write a function to flag transactions from less common domains.

- **Topics:** String manipulation, Pandas DataFrames, Aggregation
- **Resources:**
  - [Pandas String Methods](https://pandas.pydata.org/docs/user_guide/text.html)
  - [Regular Expressions in Python](https://docs.python.org/3/library/re.html)
  - [Finding Frequent Elements in Pandas](https://towardsdatascience.com/finding-the-most-frequent-elements-in-a-pandas-dataframe-b29d01fe43cf)

### Exercise 2: Regular Expressions for Data Validation
- Validate that email addresses in the dataset are correctly formatted.
- Identify and extract all numeric values appearing in descriptions.

- **Topics:** Regex for validation, extracting numerical data, pattern matching
- **Resources:**
  - [Python Regular Expressions Official Docs](https://docs.python.org/3/library/re.html)
  - [Regex101 - Online Regex Tester](https://regex101.com/) (for testing expressions)
  - [Validating Email Addresses with Regex](https://www.geeksforgeeks.org/check-if-email-address-valid-or-not-in-python/)
  - [Extracting Numbers from Text in Python](https://www.datacamp.com/tutorial/python-regular-expression-tutorial)

### Exercise 3: Optimize the Algorithm
- Improve fraud detection by incorporating past customer transaction history.
- Implement an efficient way to flag repeated transactions within a short period.

- **Topics:** Algorithm optimization, time complexity, transaction analysis
- **Resources:**
  - [Python Performance Optimization](https://realpython.com/python-performance/)
  - [Big-O Notation for Algorithm Complexity](https://www.geeksforgeeks.org/analysis-of-algorithms-big-o-analysis/)
  - [Efficient Transaction Processing Techniques](https://www.kaggle.com/learn/data-cleaning)


### Exercise 4: File Handling and Reporting
- Generate a summary report of fraudulent transactions and save it to a JSON file.
- Create a function that reads the JSON report and prints key insights.

- **Topics:** File I/O, JSON handling, saving structured reports
- **Resources:**
  - [Python File Handling](https://realpython.com/read-write-files-python/)
  - [Working with JSON in Python](https://realpython.com/python-json/)
  - [Generating and Parsing Reports in Pandas](https://towardsdatascience.com/how-to-generate-reports-with-python-and-pandas-166fdfaf0df4)


### **Exercise 5: Improve Fraud Detection using Data Patterns**
- **Topics:** Fraud detection, anomaly detection, historical analysis
- **Resources:**
  - [Introduction to Fraud Detection with Python](https://www.kaggle.com/datasets/ntnu-testimon/paysim1)
  - [Scikit-learn Outlier Detection Techniques](https://scikit-learn.org/stable/modules/outlier_detection.html)
  - [Building a Machine Learning-Based Fraud Detection System](https://towardsdatascience.com/credit-card-fraud-detection-using-machine-learning-726ed4e3b3af)
  - [Anomaly Detection in Pandas](https://towardsdatascience.com/anomaly-detection-in-python-part-1-49b65b0522dc)


### **Additional General Resources**
- **Pandas Official Documentation:** [https://pandas.pydata.org/docs/](https://pandas.pydata.org/docs/)
- **Python Regular Expressions:** [https://docs.python.org/3/library/re.html](https://docs.python.org/3/library/re.html)
- **Python Algorithms and Data Structures:** [https://www.geeksforgeeks.org/python-data-structures-and-algorithms/](https://www.geeksforgeeks.org/python-data-structures-and-algorithms/)
- **File Handling in Python:** [https://realpython.com/read-write-files-python/](https://realpython.com/read-write-files-python/)

