# 📊 SQL Practice

## 1. Select Specific Columns
Write a query to select `first_name`, `last_name` from a table called `employees`.
```sql
SELECT first_name, last_name FROM employees;
````

## 2. Filter Records

Get all rows from `orders` where the `order_date` is in 2023 and `status` is `'completed'`.

```sql
SELECT * FROM orders
WHERE YEAR(order_date) = 2023 AND status = 'completed';
```

## 3. Aggregate Function

Find the average salary from a table `salaries`.

```sql
SELECT AVG(salary) AS avg_salary FROM salaries;
```

## 4. GROUP BY and HAVING

From a table `sales`, group by region and return regions with total sales greater than 100,000.

```sql
SELECT region, SUM(sales) AS total
FROM sales
GROUP BY region
HAVING SUM(sales) > 100000;
```

## 5. JOIN

Join two tables: `customers` and `orders`, and return all customer names with their corresponding order IDs.

```sql
SELECT c.id, c.names
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id;
```

---

# 🐍 Python Practice

## 1. List Comprehension

Create a list of squares of even numbers from 1 to 20.

```python
squares = [val**2 for val in range(1, 21) if val % 2 == 0]
print(squares)
```

## 2. Dictionary Count

Given a string, count the frequency of each character using a dictionary.

```python
def count_freq(text):
    frequency = {}
    for char in text:
        if char in frequency:
            frequency[char] += 1
        else:
            frequency[char] = 1
    return frequency
        
text = "asdsdfsadafbnh"
print(count_freq(text))
```

## 3. Class and Object

Define a `Car` class with attributes: brand and mileage. Write a method to display the info.

```python
class Car:
    def __init__(self, brand, mileage, makeyear):
        self.brand = brand
        self.mileage = mileage
        self.makeyear = makeyear

    def display(self):
        print(f"Brand: {self.brand}")
        print(f"Mileage: {self.mileage} km/l")
        print(f"Make Year: {self.makeyear}")

c = Car("Toyota", 32, 2022)
c.display()
```

## 4. File I/O

Write a program to read a file `input.txt` and print all lines that contain the word 'Python'.

```python
with open("input.txt", 'r') as file:
    content = file.readlines()
    filter_lines = [line.strip() for line in content if "python" in line.lower()]
    print(filter_lines)
```

## 5. Function with Default Arguments

Write a function `greet(name="Guest")` that prints `"Hello, <name>!"`.

```python
def greet(name="Guest"):
    return f"Hello {name}!"

print(greet("Guest"))
print(greet("alice"))
```

---

# 🧪 Pandas Practice

## 1. Read CSV

Read a CSV file `data.csv` into a DataFrame and print the first 5 rows.

```python
import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())
```

## 2. Filter Rows

From a DataFrame `df`, select rows where `age > 30` and `gender == 'Male'`.

```python
import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'age': [25, 35, 45, 28, 50],
    'gender': ['Female', 'Male', 'Male', 'Male', 'Female']
}

df = pd.DataFrame(data)
filtered_df = df[(df['age'] > 30) & (df['gender'] == 'Male')]
print(filtered_df)
```

## 3. Group By and Mean

Group a DataFrame `df` by `department` and calculate the average `salary`.

```python
import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
    'department': ['HR', 'IT', 'Finance', 'IT', 'HR', 'Finance', 'IT'],
    'salary': [50000, 70000, 65000, 72000, 52000, 63000, 71000]
}

df = pd.DataFrame(data)
avg_salary = df.groupby("department")['salary'].mean()
print(avg_salary)
```

## 4. Missing Data

From DataFrame `df`, drop all rows where the column `email` is null.

```python
filter_df = df.dropna(subset=["email"])
```

## 5. New Column

Add a new column `taxed_salary` to `df` which is `salary * 0.9`.

```python
df['taxed_salary'] = df['salary'] * 0.9
```



### 🟦 **SQL Questions**

1. **Find second highest salary**
   Write a query to find the second highest salary from the `employees` table.

   ```sql
      SELECT MAX(SALARY) AS second_highest_salary FROM employees
      WHERE SALARY<(SELECT MAX(SALARY) FROM employees);

      SELECT DISTINCT SALARY FROM EMPLOYEES
      ORDER BY SALARY DESC
      LIMIT 1 OFFSET 1;
    ```

2. **Join two tables**
   Given tables `employees(emp_id, name, dept_id)` and `departments(dept_id, dept_name)`, write a query to get employee names along with their department names.

   ``` SQL
       SELECT e.name, d.dept_name FROM EMPLOYEES e
       JOIN DEPARTMENTS d ON
       e.dept_id = d.dept_id
   ```

3. **Aggregate function**
   From a `sales` table with columns `(id, product, quantity, price)`, find the total revenue per product.

   ``` SQL
       SELECT product, SUM(price * quantity) AS total_revenue FROM SALES
       GROUP BY product
   ```

4. **Filter using `IN` and `BETWEEN`**
   From a `students` table, retrieve names of students who scored between 80 and 90 and are in class 10 or 12.

   ``` SQL
       SELECT name FROM STUDENTS
       WHERE MARKS BETWEEN 80 AND 90 
       AND CLASS IN (10,12);
   ```

5. **Group by and having**
   From an `orders` table `(order_id, customer_id, amount)`, find customers who have placed more than 3 orders.

   ``` SQL
       SELECT c.name, COUNT(*) AS total_orders FROM CUSTOMERS c 
       JOIN ORDERS o ON
       c.id = o.customer_id
       GROUP BY c.name
       HAVING COUNT(*) > 3;
   ```

---

### 🟨 **Python Questions**

1. **List comprehension**
   Given a list of numbers, return a new list containing only the even numbers squared.

   ``` python
       numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
       evens_squared = [ val ** 2 for val in numbers if val % 2 == 0]
    ```

2. **Dictionary counting**
   Count the frequency of each word in the string: `"apple orange apple banana orange apple"`.
    ```python
        string = "apple orange apple banana orange apple"
        words = string.split()
    
        frequency = {}
        for word in words:
            if word in frequency:
                frequency[word] += 1
            else:
                frequency[word] = 1
        for word, count in frequency.items():
            print(f"{word} :{count}")
    ```
   
   

3. **Functions and default args**
   Write a function that takes `name` and `greeting="Hello"` and returns `"Hello, Name"` (capitalized).

   ``` python
       def greet(name, greeting = "Hello"):
           return f"{greeting}, {name}"

       print(greet("Ram"))
   ```

4. **Reversing a string**
   Reverse the string `"DataEngineering"` without using `[::-1]`.

   ``` python
       def reverse_string(word):
           reverse = word[::-1]
           return reverse
       print(reverse_string("DataEngineering"))

       def reverse_string(word):
           reverse = ""
           for char in word:
               reverse = char + reverse
           return reverse
       print(reverse_string("DataEngineering"))
   
   ```

5. **Fibonacci with recursion**
   Write a recursive function to return the nth Fibonacci number.
    ```python
       def fibonacci(n):
               if n<=1:
                   return n
               else:
                   return fibonacci(n-1) + fibonacci(n-2)
       
       for n in range(0,11):
           print(fibonacci(n),end=' ')
    ```
---

### 🟩 **Pandas Questions**

1. **Filtering rows**
   From a DataFrame `df` with a column `age`, filter rows where age > 25 and age < 40.

   ```python
       import pandas as pd
       data = { "name": ["ram", "shyam", "hari"],
               "age": [23, 34, 45]
              }
       df = pd.DataFrame(data)
       new_df = df[(df['age'] > 25) & (df['age'] < 40)]
       print(new_df)
   ```

2. **Group by and aggregation**
   From a DataFrame with columns `department` and `salary`, find the average salary per department.

   ```python
       import pandas as pd
       data = {"name": ["ram", "shyam", "hari","sita"],
               "age": [23,24,25, 30],
               "department": ["IT","Marketing", "Sales", "Admin"],
               "salary": [ 50000,20000,20000,40000]
               }
       df = pd.DataFrame(data)
       new_df = df.groupby('department')['salary'].mean()
       print(new_df)
   ```
   
3. **Merge two DataFrames**
   Merge DataFrames `df1(emp_id, name)` and `df2(emp_id, salary)` on `emp_id`.

   ```python
       import pandas as pd
       data1 = {"emp_id": [ 1,2,3,4],
              "name": ["ram", "shyam", "hari","sita"]
             }
       data2 = {"emp_id": [ 1,2,3,4],
              "salary": [ 50000,20000,20000,40000]
             }
       df1 = pd.DataFrame(data1)
       df2 = pd.DataFrame(data2)
       new_df = pd.merge(df1,df2, on = 'emp_id', how = 'inner')
       print(new_df)
   ```
   
4. **Add new column with condition**
   Given a DataFrame `df` with column `marks`, add a new column `grade` as follows:

   * `A` if marks ≥ 90
   * `B` if marks ≥ 75
   * `C` otherwise

   ```python
       import pandas as pd
       data = {"name": ["ram", "shyam", "hari","sita"],
               "marks": [60,70,80,90],
               }
       df = pd.DataFrame(data)
       # simplest form
        #address = ['NewYork', 'Chicago', 'Boston', 'Miami']
        # Using 'Address' as the column name
        # and equating it to the list
        # df['Address'] = address, but since there are conditons use loc
       df.loc[df['marks'] >= 90, 'grade'] = 'A'
       df.loc[(df['marks'] >= 75) & (df['marks'] < 90), 'grade'] = 'B'
       df.loc[df['marks'] < 75, 'grade'] = 'C'

       print(df)
   ```
   
5. **Missing values**
   From a DataFrame with missing values, fill NaNs in column `age` with the mean age.

   ```python
       import pandas as pd
       data = {"name": ["ram", "shyam", "hari","sita"],
               "age": [23, 24, None, 30],
               "department": ["IT","Marketing", "Sales", "Admin"],
               "salary": [ 50000,20000,20000,40000]
               }
       df = pd.DataFrame(data)
       mean_val = df['age'].mean()
       df['age'].fillna(value = mean_val, inplace = True)
       print(df)
   ```


### 🟦 SQL (Slightly tougher joins, grouping, string filtering)

1. **Employees per Department**
   Count how many employees each department has. Table: `employees(dept_id, emp_id)`

   ``` SQL
       SELECT d.name, COUNT(*) AS employee_count FROM EMPLOYEES e
       JOIN DEPARTMENT D 
       ON e.dept_id = d.dept_id
       GROUP BY d.dept_id
   ```

2. **Substring filter**
   Find customers whose email ends with `'@gmail.com'`. Table: `customers(email)`

   ``` SQL
       SELECT * FROM CUSTOMERS 
       WHERE email LIKE '%@gmail.com'
   ```
   
3. **LEFT JOIN with missing data**
   Get all products and their order counts. Include products that have never been ordered.
   Tables: `products(product_id, name)`, `orders(order_id, product_id)`

   ``` SQL
       SELECT p.name, COUNT(order_id) FROM products p
       LEFT JOIN orders o ON
       p.product_id = o.product_id
       GROUP BY p.name;
   ```
   
4. **Top N rows**
   Fetch the top 3 highest-paid employees from `employees(emp_id, name, salary)`

   ``` SQL
       SELECT emp_id, name, salary FROM employees
       ORDER BY salary DESC
       LIMIT 3;
   ```
   
5. **Group by multiple columns**
   Count how many orders were placed per `region` and `month`. Table: `orders(order_id, region, order_date)` (Hint: extract month)

   ``` SQL
       SELECT region, MONTH(order_date) AS order_month, COUNT(order_id) AS order_count FROM orders
       GROUP BY region, MONTH(order_date);
   ```
   
---

### 🟨 Python (Functions, sorting, dictionary, lambda)

1. **Sum of digits**
   Write a function that returns the sum of digits of a given number (e.g., `123` → `6`).

   ``` Python
       def sum_of_digits(digit):
           sum = 0
           while digit > 0:
               a = digit % 10
               sum += a
               digit = digit // 10
           return sum
        
       print(sum_of_digits(123))

        def sum_of_digits(digit):
            sum = 0
            for d in str(digit):
                sum += int(d)
            return sum
        print(sum_of_digits(123))
   ```
   
2. **Sort by length**
   Given a list of strings, sort it based on string length.

   ``` Python
       fruits = ["apple", "banana", "cherry", "date", "elderberry"]
       new_list = sorted(fruits, key = len)
       print(new_list)
   ```
   
3. **Frequency dictionary from list**
   From the list `['apple', 'banana', 'apple', 'orange']`, create a dictionary showing how many times each word appears.

   ``` Python
       lists = ['apple', 'banana', 'apple', 'orange']
       frequency = {}
       for i in lists:
           if i in frequency:
               frequency[i] += 1
           else:
               frequency[i] = 1
        for word, count in frequency.items():
           print(f"{word}: {count}")
   ```
   
4. **Lambda function with map**
   Square all numbers in a list using `map()` and `lambda`.

   ``` Python
       lists = [1,2,3,4,5,6,7,8,9,10]
       squared = list(map(lambda x: x**2, lists))
       print(squared)
   ```
   
5. **Reverse words in a sentence**
   Input: `"hello world from python"` → Output: `"python from world hello"`

   ``` Python
       def reversing(sentence):
           words = sentence.split()
           word = ""
           for i in words:
               word = i + " " + word
           return word

       sentence = "hello world from python"
       print(reversing(sentence))
   ```
   
---

### 🟩 Pandas (Filtering, apply, merging, groupby)

1. **Filter with multiple conditions**
   From DataFrame `df`, filter rows where `salary > 40000` and `department == 'IT'`.

   ``` Python
       import pandas as pd
       data = {'name': ['ram', 'shyam','hari'],
               'salary': [50000,30000,20000],
               'department': ['IT','sales','admin']
              }
       df = pd.DataFrame(data)
       df = df[(df['salary'] > 40000) & (df['department'] == 'IT')]
       print(df)
   ```
   
2. **Add column using `apply()`**
   Given a `marks` column, add a new column `status` as `'Pass'` if marks ≥ 40, else `'Fail'`.

   ``` Python
       import pandas as pd
       data = {'name': ['ram', 'shyam','hari'],
               'marks': [50,30,80]
              }
       df = pd.DataFrame(data)
       df.loc[df['marks'] >= 40, 'status'] = 'Pass'
       df.loc[df['marks'] < 40, 'status'] = 'Fail'
       print(df)
   ```
   
3. **Merge with different key names**
   Merge `df1(emp_id)` and `df2(id)` on `emp_id = id`.

   ``` Python
       import pandas as pd
       
       data1 = {'emp_id' : [1,2,3],
               'salary': [50000,30000,20000],
               'department': ['IT','sales','admin']
              }
       data2 = {'id' : [1,2,3],
               'name': ['ram', 'shyam','hari'],
               'marks': [50,30,80]
              }
       df1 = pd.DataFrame(data1)
       df2 = pd.DataFrame(data2)
       df = pd.merge(df1,df2, left_on = 'emp_id', right_on = 'id', how = 'inner')
       print(df)
   ```
   
4. **Groupby and multiple aggregations**
   On `df(department, salary)`, compute both average and max salary per department.

   ``` Python
       import pandas as pd
        data = {"name": ["ram", "shyam", "hari","sita"],
                "age": [23,24,25, 30],
                "department": ["IT","Marketing", "Sales", "Admin"],
                "salary": [ 50000,20000,20000,40000]
                }
        df = pd.DataFrame(data)
        new_df = df.groupby('department')['salary'].agg(['mean','max'])
        print(new_df)
   ```
   
5. **Sort DataFrame by multiple columns**
   Sort `df` by `department` ascending and `salary` descending.

   ``` Python
        import pandas as pd
        data = {"name": ["ram", "shyam", "hari","sita"],
                "age": [23,24,25, 30],
                "department": ["IT","Marketing", "Sales", "Admin"],
                "salary": [ 50000,20000,20000,40000]
                }
        df = pd.DataFrame(data)
        new_df = df.sort_values(by = ['department','salary'], ascending = [True, False])
        print(new_df)
   ```
   
---

## 🟦 SQL – Day 3 (Intermediate Focus)

1. **Employees earning above department average**
   From `employees(emp_id, name, salary, dept_id)`, find those whose salary is above the average salary of their department.

    ``` sql
        SELECT e.emp_id, e.name, e.salary, d.dept_name 
        FROM employees e
        join department d 
        ON e.dept_id = d.dept_id
        WHERE e.salary > ( SELECT AVG(e.salary) FROM employees e
            WHERE dept_id = e.dept_id
            );
            
    ```
    
2. **Self Join: Find managers**
   In a table `employees(emp_id, name, manager_id)`, list each employee with their manager’s name.

    ``` sql
        SELECT e.name, m.name from employees e
        JOIN employees m
        ON e.manager_id = m.emp_id
    ```
    
3. **CTE (Common Table Expression)**
   Using a CTE or subquery, find the second highest salary from `employees`.

    ``` sql
        SELECT MAX(salary) as second_highest_salary from employees
        WHERE salary<(SELECT MAX(salary) FROM employees);
    ```
    
---

## 🟨 Python – Day 3 (Functions, data structures, list ops)

1. **Remove duplicates from list while preserving order**
   Input: `[1, 2, 2, 3, 4, 1]` → Output: `[1, 2, 3, 4]`

    ``` python
        in_list = [1,2,2,3,4,1]
        Output = []
        for item in in_list:
            if item not in Output:
                Output.append(item)
        print(Output)
    ```
    
2. **Anagram check**
   Write a function to check if two strings are anagrams (e.g., `"listen"` and `"silent"`).

    ``` python
        def anagrams(str1, str2):
            return sorted(str1) == sorted(str2)

        print(anagrams("listen", "silent"))

        def anagrams(str1, str2):
            str1 = str1.replace(" ", "").lower()
            str2 = str2.replace(" ", "").lower()
            return sorted(str1) == sorted(str2)
        
        print(anagrams("Listen", "Silent"))         # True
        print(anagrams("Dormitory", "Dirty room"))  # True

        def anagrams(str1, str2):
            str1 = str1.replace(" ", "").lower()
            str2 = str2.replace(" ", "").lower()
            if sorted(str1) == sorted(str2):
                return "They are anagrams"
            else:
                return "They are not anagrams"

        print(anagrams("listen", "silent"))
    ```
    
3. **Find all pairs with a given sum**
   From a list of integers, find all **unique pairs** that sum to a target value (e.g., target = 10).

    ``` python
        def sums(nums, target):
            seen = set()
            pairs = set()
            for num in nums:
                complement = target - num
                if complement in seen:
                    pairs.add(tuple(sorted((num, complement)))) # tuple(sorted()) ensures that 2,8 and 8,2 are treated same
                seen.add(num)
            return list(pairs)
        
        print(sums([1,2,3,4,5,6,7,8,9], 10))
    ```
    
---

## 🟩 Pandas – Day 3 (Apply logic, filtering, string ops)

1. **Use `.apply()` to format names**
   Convert all names in a column `name` to Title Case (e.g., `"rAm"` → `"Ram"`).

    ``` python
        import pandas as pd
        data = {'name': ['rAm', 'shYam', 'HARI'],
                'age' : [23,34,45]
                }
        df = pd.DataFrame(data)
        df['name']= df['name'].apply(lambda x:x.capitalize())
        print(df)
    ```
    
2. **Filter rows based on string contents**
   From column `email`, filter all rows where the domain is `gmail.com`.

    ``` python
        data = {'name': ['rAm', 'shYam', 'HARI'],
                'age' : [23,34,45],
                'email': ['ram@gmail.com', 'shyam@yahoo.com', 'hari@gmail.com']
                }
        df = pd.DataFrame(data)
        new_df= df[df['email'].str.endswith('gmail.com')]
        print(new_df)

        data = {'name': ['rAm', 'shYam', 'HARI'],
                'age' : [23,34,45],
                'email': ['ram@gmail.com', 'shyam@yahoo.com', 'hari@gmail.com']
                }
        df = pd.DataFrame(data)
        new_df= df[df['email'].str.contains('@gmail\.com$', case=False)]
        print(new_df)

    ```
    
3. **Groupby + filter**
   From `df(department, salary)`, find departments where the **average salary is more than 30000**.

    ``` python
        import pandas as pd

        data = {
            'name': ['ram', 'shyam', 'hari', 'sita'],
            'department': ['IT', 'Sales', 'IT', 'Sales'],
            'salary': [50000, 25000, 40000, 20000]
        }
        
        df = pd.DataFrame(data)

        new_df = df.groupby('department')['salary'].mean() # aggregation with one column returns series
        new_df = new_df[(new_df >30000)]
        print(new_df)

  
    ```
    

## 🟦 SQL – Day 4

1. **Find duplicate entries**
   From `students(name, roll_no)`, find students with duplicate names.

   ``` SQL
       SELECT name, COUNT(name) AS duplicate FROM students
       GROUP BY name
       HAVING COUNT(name) >1;
   ```

2. **Nth highest salary using window function**
   Retrieve the **3rd highest salary** from `employees`.

   ``` SQL
       SELECT name, salary
       FROM(SELECT name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
           FROM employees
           ) ranked salaries
           WHERE rnk = 3;
   ```

3. **Join and filter**
   From tables `orders(order_id, customer_id)` and `customers(customer_id, city)`, get order IDs placed by customers in 'Kathmandu'.

   ``` SQL
       SELECT o.order_id, c.city FROM orders o 
       JOIN customers c ON
       o.customer_id = c.customer_id
       where c.city = 'kathmandu'
   ```

4. **Case statement**
   From `employees`, create a new column called `salary_level`:

   * `"High"` if salary > 70000
   * `"Medium"` if 40000 < salary ≤ 70000
   * `"Low"` otherwise

   ``` SQL
       SELECT name, salary,
           CASE 
               WHEN salary > 70000 THEN 'High'
               WHEN salary BETWEEN 40000 AND 70000 THEN 'Medium'
               ELSE 'Low'
            END AS salary_level
        FROM employees
   ```

5. **Group by having multiple conditions**
   From `sales(region, amount)`, find regions where:

   * total sales > 100000
   * AND average sale > 10000

   ``` SQL
       SELECT regions, SUM(amount) AS total_sales, AVG(amount) AS avg_sales
       FROM sales
       GROUP BY regions
       HAVING (SUM(amount) > 100000 AND AVG(amount) > 10000)
   ```

---

## 🟨 Python – Day 4

1. **Find all vowels in a string (no duplicates)**
   Input: `"education"` → Output: `['e', 'u', 'a', 'o', 'i']`

   ``` python
       def vowels(s):
            out = []
            for i in s:
                if i in "aeiou" and i not in out:
                    out.append(i)
               
            return out if out else ["No vowels present"]

        print(vowels("asdfdgfdsaeiou"))
                   
   ```

2. **Check for a palindrome ignoring case and space**
   Input: `"A man a plan a canal Panama"` → Output: `True`

   ``` python
       def palindrome(sen):
           sen = sen.replace(" ", "").lower()
           if sen == sen[::-1]:
               print (f"{sen} is palindrome")
           else:
               print(f"{sen} is not palindrome")

       palindrome("A man a plan a canal Panama")
   ```

3. **Count character frequency (case-insensitive)**
   Input: `"Programming"` → Output: `{'p':1, 'r':2, 'o':1, ...}`

   ``` python
       def freq_count(word):
           word = word.lower()
           frequency = {}
           for item in word:
               if item in frequency:
                   frequency[item] +=1
               else:
                   frequency[item] = 1
           return frequency
       print(freq_count("Programming"))
   ```

4. **Custom sort a list of tuples by second element**
   Input: `[(1, 3), (2, 2), (3, 1)]` → Output: `[(3, 1), (2, 2), (1, 3)]`

   ``` python
       def sorting(data):
           sorted_data = sorted(data, key= lambda x: x[1])
           return sorted_data
       data = [(1, 3), (2, 2), (3, 1)]
       print(sorting(data))
   ```

5. **Flatten a nested list**
   Input: `[[1,2], [3,4], [5]]` → Output: `[1, 2, 3, 4, 5]`

   ``` python
       data = [[1,2], [3,4], [5]]
       flat = [num for row in data for num in row]
       print(flat)

        flat = []
        for row in data:
            for num in row:
                flat.append(num)

   ```

---

## 🟩 Pandas – Day 4

1. **Capitalize only the first letter of names**
   From column `'name'` with values like `'rAM'`, convert to `'Ram'`.

   ``` python
       import pandas as pd

       data = {'name': ['rAm', 'SHYAM', 'hari']
              }
       df = pd.DataFrame(data)
       df['name'] = df['name'].str.capitalize()
       print(df)

       import pandas as pd
       data = {'name': ['rAm', 'SHYAM', 'hari']}
       df = pd.DataFrame(data)
        
       df['name'] = df['name'].str.title()
        
       print(df)

   ```

2. **Filter rows based on multiple string matches**
   From column `email`, keep only rows where the domain is either `'gmail.com'` or `'yahoo.com'`.

   ``` python
       import pandas as pd

       data = {'name': ['rAm', 'SHYAM', 'hari'],
               'email': ['ram@gmail.com', 'shyam@hotmail.com', 'hari@yahoo.com']
              }
       df = pd.DataFrame(data)
       new_df = df[(df['email'].str.contains('gmail\.com', case = False))| 
                    (df['email'].str.contains('yahoo\.com', case = False))]
       print(new_df)

       
       import pandas as pd

       data = {'name': ['rAm', 'SHYAM', 'hari'],
               'email': ['ram@gmail.com', 'shyam@hotmail.com', 'hari@yahoo.com']
              }
       df = pd.DataFrame(data)
       new_df = df[(df['email'].str.endswith('gmail.com'))| 
                    (df['email'].str.endswith('yahoo.com'))]
       print(new_df)
   ```

3. **Count of employees per department as new DataFrame**
   From `df(department, salary)`, create a new DataFrame showing department and employee count.

   ``` python
       import pandas as pd

       data = {'id': [1,2,3],
               'name': ['rAm', 'SHYAM', 'hari'],
               'email': ['ram@gmail.com', 'shyam@hotmail.com', 'hari@yahoo.com'],
               'department': ['it', 'admin', 'it'],
               'salary': [40000,50000,40000]
              }
       df = pd.DataFrame(data)
       new_df = df.groupby('department')['id'].count()
       print(new_df)
   ```

4. **Create a new column ‘bonus’**
   If salary > 40000 → bonus is 5000, else 2000.

   ``` python
       import pandas as pd

       data = {'id': [1,2,3],
               'name': ['rAm', 'SHYAM', 'hari'],
               'email': ['ram@gmail.com', 'shyam@hotmail.com', 'hari@yahoo.com'],
               'department': ['it', 'admin', 'it'],
               'salary': [40000,50000,40000]
              }
       df = pd.DataFrame(data)
       df.loc[df['salary']> 40000, 'bonus'] = 5000
       df.loc[df['salary']< 40000, 'bonus'] = 2000
       print(df)
   ```

5. **Replace all NULL/NaN values with a default**
   Replace all NaNs in DataFrame with `'Not Available'`.

   ``` python
       df.fillna('Not Available', inplace = True)
   ```


### 🟦 **SQL (Data Engineering-Oriented):**

1. **Find duplicate rows** in a table `logs` based on `timestamp` and `user_id`. Show only those with more than 1 occurrence.
    ``` SQL
        SELECT timestamp, user_id, COUNT(*) AS duplicate
        FROM logs
        GROUP BY timestamp, user_id
        HAVING COUNT(*) > 1
    ```
2. Write a query to **find the percentage of orders from each region** in the `orders` table.
    ``` SQL
        SELECT 
            region, ROUND(
            100.0 * Count(id) / SUM(COUNT(id)) OVER(),
            2
            ) AS percentage
        FROM orders
        GROUP BY region
        
    ```
3. Write a query to **get the latest transaction** per `customer_id` from a `transactions` table.
    ``` SQL
        SELECT customer_id, amount, timestamp 
        FROM (
            SELECT *, ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY timestamp DESC) as rn
            FROM transactions
            )t
        WHERE rn = 1;
    ```
4. Extract the **month and year** from a column `order_date` in `sales` and calculate **monthly revenue**.
    ``` SQL
        SELECT MONTH(order_date) AS months, 
            YEAR(order_date) AS years, 
            SUM(revenue) OVER(PARTITION by YEAR(order_date), MONTH(order_date)) as monthly_revenue
        FROM sales 
        
    ```
5. Using `LEFT JOIN`, list all employees and their department names from `employees` and `departments` table, **including those without a department**.
    ``` SQL
        SELECT e.name, d.name from employees e
        left join departments d
        on e.dept_id = d.id
    ```
---

### 🐍 **Python (Data Transformation + Utilities):**

1. Write a function that takes a CSV file path, reads it line by line, and returns a list of dictionaries (i.e., custom CSV reader).
    ``` python
        import csv
        
        def reading(filepath):
            with open(filepath, mode = 'r', newline = '', encoding = 'utf-8') as file:
                reader = csv.DictReader(file)
                return list(reader)
        data = reading('article.csv')
        print(data)
    ```
2. Write a function that **flattens a nested list** (e.g., `[[1, 2], [3, [4, 5]], 6]` → `[1, 2, 3, 4, 5, 6]`).
    ``` python
        def flattening(n):
            result = []
            for i in n:
                if isinstance(i,list):
                    result.extend(flattening(i))
                else:
                    result.append(i)
            return result
        
        flat = [[1,2],[3,[4,5]], 6]
        print(flattening(flat))
    ```
3. Parse a JSON string and extract a specific key’s value, e.g., `"user": {"name": "Ram", "age": 30}` → return `"Ram"`.
    ``` python
        import json

        json_string = '{"user": {"name": "Ram", "age": 30}}'
        
        # Parse JSON string into a Python dictionary
        data = json.loads(json_string)
        
        # Access nested keys
        print(data['user']['name'])
        #use .get() to avoid error for missing values
        print(data.get('user',{}).get('name','not found' ))

    ```
4. Create a function to **check memory usage** of a Python list with 1 million integers.
    ``` python
        import sys

        def check_list_memory_usage():
            my_list = list(range(1_000_000))  # List of 1 million integers
            list_size = sys.getsizeof(my_list)
            elements_size = sum(sys.getsizeof(item) for item in my_list)
            total_size = list_size + elements_size
        
            print(f"List object size (shallow): {list_size / (1024 * 1024):.2f} MB")
            print(f"Total size including elements: {total_size / (1024 * 1024):.2f} MB")
        
        check_list_memory_usage()

    ```
5. Write a generator that yields even numbers from 1 to 100 and logs each to a file.
    ``` python
        def even_number_generator():
            with open("even_numbers.log", "w") as log_file:
                for number in range(2, 101, 2):  # Start at 2, step by 2 up to 100
                    log_file.write(f"{number}\n")  # Log to file
                    yield number  # Yield to caller
        
        for even in even_number_generator():
            print(even)
    ```
---

### 🐼 **Pandas (ETL + Data Cleaning):**

1. Load a DataFrame from a dictionary and remove rows with missing values only if the `salary` is missing.
    ``` python
        import pandas as pd

        # Sample dictionary with some missing salary values
        data = {
            'name': ['Alice', 'Bob', 'Charlie', 'David'],
            'age': [25, 30, 35, 40],
            'salary': [50000, None, 60000, None]
        }
        
        # Create DataFrame
        df = pd.DataFrame(data)
        
        # Remove rows where 'salary' is missing (NaN)
        df_cleaned = df[df['salary'].notna()]
        
        # Display result
        print(df_cleaned)

    ```
2. Read a CSV file and filter rows where the column `status` is `'active'` and `last_login` is within the last 30 days.
    ``` python
        import pandas as pd
        df = pd.read_csv("sample.csv")
        new_df = df[(df['status'] == 'active') & (df['last_login'] <= 30)]
        print(new_df)

        OR

        df['last_login'] = pd.to_datetime(df['last_login'])
        from datetime import datetime, timedelta
        cutoff_date = datetime.today() - timedelta(days=30)
        new_df = df[(df['status'] == 'active') & (df['last_login'] >= cutoff_date)]

    ```
3. Convert a column of strings `'2024-05-01 10:30'` to datetime and extract only the date part into a new column.
    ``` python
        import pandas as pd
        from datetime import datetime

        # Sample data
        df = pd.DataFrame({
            'datetime_str': ['2024-05-01 10:30', '2024-05-02 15:45', '2024-05-03 08:20']
        })
        
        # Convert to datetime
        df['datetime'] = pd.to_datetime(df['datetime_str'])
        
        # Extract date part only
        df['date_only'] = df['datetime'].dt.date
        
        print(df)

    ```
4. You have a DataFrame with duplicate rows. Remove all duplicates **except the latest one** based on `timestamp`.
    ``` python
        import pandas as pd

        # Sample DataFrame
        df = pd.DataFrame({
            'user_id': [1, 1, 2, 2, 3],
            'action': ['login', 'login', 'logout', 'logout', 'login'],
            'timestamp': [
                '2024-05-01 10:00',
                '2024-05-01 12:00',
                '2024-05-02 09:00',
                '2024-05-02 11:00',
                '2024-05-03 08:00'
            ]
        })
        
        # Convert timestamp column to datetime
        df['timestamp'] = pd.to_datetime(df['timestamp'])
        
        # Drop duplicates and keep the latest one based on timestamp
        df_deduplicated = df.sort_values('timestamp').drop_duplicates(
            subset=['user_id', 'action'], keep='last'
        )
        
        print(df_deduplicated)

    ```
5. Merge two DataFrames on `user_id` and fill missing values in the `email` column with `'no_email@domain.com'`.
    ``` python
        import pandas as pd
        
        # Sample DataFrames
        df1 = pd.DataFrame({
            'user_id': [1, 2, 3],
            'name': ['Alice', 'Bob', 'Charlie']
        })
        
        df2 = pd.DataFrame({
            'user_id': [1, 2],
            'email': ['alice@example.com', None]  # Bob's email is missing
        })
        
        # Merge on 'user_id'
        merged_df = pd.merge(df1, df2, on='user_id', how='left')
        
        # Fill missing emails
        merged_df['email'] = merged_df['email'].fillna('no_email@domain.com')
        
        print(merged_df)

    ```

### 🔷 SQL (Intermediate – Focus on Grouping, Joins, and Aggregates)

1. **Get the second highest salary** from an `employees` table using a subquery.
``` sql
SELECT MAX(salary) AS second_highest_salary FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees)
```
2. **List departments with more than 3 employees** from the `employees` table.
``` sql
SELECT dept_name, COUNT(id) FROM employees
GROUP BY dept_name
HAVING COUNT(id) > 3
```
3. Find employees whose **salary is above the department average salary**.
``` sql
SELECT e.* FROM employees e
    JOIN ( 
        SELECT department, AVG(salary) AS avg_salary
        FROM employees
        GROUP BY department
    ) d_avg
    ON e.department = d_avg.department
    WHERE e.salary > d_avg.avg_salary


SELECT e.employee_id, e.employee_name, e.department_id, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

```
4. Show each department and the **number of employees earning more than 50,000**.
``` sql
SELECT department, COUNT(*) FROM employees
WHERE salary > 50000
GROUP BY department

```
5. Write a query to display **each employee’s name and their manager’s name** (assume `employees(manager_id)` references `employees(id)`).
``` sql
SELECT e.emp_name, m.manager_name AS manager_name FROM employees e
JOIN employees m ON
e.manager_id = m.id
```
---

### 🐍 Python (Functions, Lists, Dicts, OOP)

1. Write a function to check if a string is a **palindrome**.
``` python

```
2. Given a list of numbers, return a **list of squares of even numbers**.
``` python

```
3. Write a class `BankAccount` with `deposit()`, `withdraw()`, and `balance()` methods.
``` python

```
4. Create a dictionary from two lists: `keys = ['a', 'b']` and `values = [1, 2]`.
``` python

```
5. Write a function that takes a sentence and returns the **most frequent word**.
``` python

```

---

### 📊 Pandas (Transformations, Grouping, Aggregation)

1. Load a DataFrame from CSV and **display only rows where salary > 60,000**.
``` python

```
2. **Group by department** and show the **average age** in each group.
``` python

```
3. Replace all `NaN` values in a DataFrame with `'Unknown'`.
``` python

```
4. Add a new column called `status` based on the `marks` column: `'Pass'` if `>= 40`, else `'Fail'`.
``` python

```
5. Count how many times each value appears in the `department` column.
``` python

```
