In [None]:
### Section 1: Data and Python Basics

#### 1a. Explain the term data and data science.
**Data** refers to raw, unprocessed facts, figures, or information collected from various sources. It can be:
- **Structured**: Organized data, like spreadsheets or databases.
- **Semi-structured**: Data with some organizational structure, like JSON or XML.
- **Unstructured**: Data with no predefined format, such as images or videos.

**Data Science** is a multidisciplinary field that uses scientific methods, algorithms, and systems to extract insights from structured and unstructured data. It involves:
- Data collection
- Data cleaning and preprocessing
- Analysis and modeling
- Visualization and interpretation of results

#### 1b. Python is considered superior compared to other traditional programming languages for analysis. Justify.
Python's popularity in data analysis stems from:
- **Ease of Learning**: Its syntax is simple and readable.
- **Extensive Libraries**: Libraries like pandas, NumPy, Matplotlib, and scikit-learn make data manipulation and analysis straightforward.
- **Versatility**: It supports data analysis, machine learning, web development, and more.
- **Interoperability**: Integrates seamlessly with other languages and tools.
- **Community Support**: Offers extensive documentation and tutorials.

#### 1c. List and explain different arithmetic operators available in Python.
Python provides the following arithmetic operators:
| **Operator** | **Description**        | **Example**      | **Output** |
|--------------|------------------------|------------------|------------|
| `+`          | Addition               | `3 + 2`          | `5`        |
| `-`          | Subtraction            | `3 - 2`          | `1`        |
| `*`          | Multiplication         | `3 * 2`          | `6`        |
| `/`          | Division               | `3 / 2`          | `1.5`      |
| `//`         | Floor Division         | `3 // 2`         | `1`        |
| `%`          | Modulus (Remainder)    | `3 % 2`          | `1`        |
| `**`         | Exponentiation         | `3 ** 2`         | `9`        |

#### 1d. Demonstrate the concept of accepting input from the user.
Input allows interaction between the user and the program. Example:
```python
name = input("Enter your name: ")
age = int(input("Enter your age: "))
print(f"Hello, {name}! You are {age} years old.")
```
- `input()`: Reads input as a string.
- `int()`: Converts input to an integer.

---

### Section 2: Lists, Dictionaries, and Loops

#### 2a. Explain the concept of list in Python.
A **list** is a collection of ordered, mutable items. It allows elements of different data types. Example:
```python
fruits = ["apple", "banana", "cherry"]
fruits.append("orange")
print(fruits)
```
- **Methods**: `append()`, `remove()`, `sort()`, etc.

#### 2b. Explain the concept of dictionary in Python.
A **dictionary** stores key-value pairs, allowing efficient data retrieval. Example:
```python
person = {"name": "Alice", "age": 25}
person["city"] = "New York"
print(person)
```
- **Methods**: `keys()`, `values()`, `items()`, etc.

#### 2c. Program to check whether a number is divisible by 2 or 3.
```python
num = int(input("Enter a number: "))
if num % 2 == 0 or num % 3 == 0:
    print(f"{num} is divisible by 2 or 3.")
else:
    print("Invalid number.")
```
- `num % 2 == 0`: Checks divisibility by 2.
- `num % 3 == 0`: Checks divisibility by 3.

#### 2d. Demonstrate the working of a for loop in Python.
A **for loop** iterates over sequences like lists, strings, or ranges. Example:
```python
for i in range(5):
    print(f"Iteration {i + 1}")
```
- `range(5)`: Generates numbers from 0 to 4.

---

### Section 3: Mathematical and Statistical Concepts

#### 3a. Permutations and combinations with Python programs.
```python
from math import perm, comb
print("Permutations of 5 items taken 3 at a time:", perm(5, 3))
print("Combinations of 5 items taken 3 at a time:", comb(5, 3))
```
- **Permutations**: Order matters.
- **Combinations**: Order doesn’t matter.

#### 3b. Explain quartiles with a Python program.
Quartiles divide data into four equal parts:
```python
import numpy as np
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
q1, q2, q3 = np.percentile(data, [25, 50, 75])
print("Q1:", q1, "Q2 (Median):", q2, "Q3:", q3)
```

#### 3c. Hypothesis testing example.
**Hypothesis testing** evaluates claims about a population. Example:
```python
from scipy.stats import ttest_ind
sample1 = [1, 2, 3, 4, 5]
sample2 = [3, 4, 5, 6, 7]
stat, p_value = ttest_ind(sample1, sample2)
print("P-value:", p_value)
```
- If `p_value < 0.05`, reject the null hypothesis.

#### 3d. Dimensions in arrays with examples.
```python
import numpy as np
arr1D = np.array([1, 2, 3])  # 1D
arr2D = np.array([[1, 2], [3, 4]])  # 2D
arr3D = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])  # 3D
print(arr1D.ndim, arr2D.ndim, arr3D.ndim)
```
- `ndim`: Returns the number of dimensions.

---

### Section 4: Data Types, Wrangling, and Visualization

#### 4a. Distinguish between discrete and continuous data.
| **Aspect**     | **Discrete Data**             | **Continuous Data**          |
|----------------|-------------------------------|------------------------------|
| Nature         | Countable                    | Measurable                  |
| Example        | Number of cars               | Temperature                 |
| Representation | Bar Chart                    | Line Chart/Histogram        |

#### 4b. Explain data wrangling using pandas with an example.
Data wrangling prepares data for analysis:
```python
import pandas as pd
data = {"Name": ["Alice", "Bob", None], "Age": [25, None, 30]}
df = pd.DataFrame(data)
df = df.fillna({"Name": "Unknown", "Age": df["Age"].mean()})
print(df)
```

#### 4c. Explain pie chart using Matplotlib.
```python
import matplotlib.pyplot as plt
labels = ["Math", "Science", "English"]
sizes = [30, 45, 25]
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.title("Subject Proportions")
plt.show()
```

#### 4d. What are data structures in R? Explain with Python examples.
**R Data Structures** include:
- **Vectors**: Equivalent to Python lists.
- **Data Frames**: Similar to pandas DataFrames.
```python
import pandas as pd
df = pd.DataFrame({"Name": ["Alice", "Bob"], "Age": [25, 30]})
print(df)
```

---

### Section 5: File Handling, Distributions, and Functions

#### 5a. Explain the process of importing files in R.
In R, you can use the `read.csv` function to import files:
```R
data <- read.csv("file.csv")
head(data)
```
Equivalent Python code:
```python
import pandas as pd
data = pd.read_csv("file.csv")
print(data.head())
```

#### 5b. Distinguish between symmetrical and asymmetrical distribution.
- **Symmetrical Distribution**: Data is evenly distributed around the mean (e.g., normal distribution).
- **Asymmetrical Distribution**: Data is skewed to the left or right.

#### 5c. Program to explain built-in and user-defined functions.
```python
# Built-in function
print(len("Hello"))

# User-defined function
def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))
```

#### 5d. Explain Python casting with an example.
Casting converts one data type to another:
```python
x = "123"
y = int(x)  # Casting string to integer
print(y + 10)
```

---

### Section 6: Advanced Concepts and Analysis

#### 6a. Define data science. List and explain four types of analysis.
**Data Science** is the discipline of extracting actionable insights from structured and unstructured data. It leverages methods from statistics, computer science, and domain expertise.

**Types of Analysis in Data Science:**
1. **Descriptive Analysis**:
   - Summarizes historical data to identify trends and patterns.
   - Example: Calculating average sales over a month.
2. **Diagnostic Analysis**:
   - Explores why certain events occurred.
   - Example: Analyzing reasons for a dip in sales.
3. **Predictive Analysis**:
   - Forecasts future outcomes using historical data.
   - Example: Predicting next month’s sales based on previous data.
4. **Prescriptive Analysis**:
   - Suggests actions to achieve desired outcomes.
   - Example: Recommending strategies to increase customer retention.

#### 6b. Explain the concept of the following list-based functions with examples.
```python
# List of numbers
numbers = [5, 3, 8, 6]

# len()
print("Length of list:", len(numbers))

# copy()
numbers_copy = numbers.copy()
print("Copied list:", numbers_copy)

# append()
numbers.append(10)
print("List after appending:", numbers)

# sort()
numbers.sort()
print("Sorted list:", numbers)
```

#### 6c. Explain the concept of tuple in Python.
A **tuple** is an immutable, ordered collection of items. It is similar to a list but cannot be modified after creation. Example:
```python
tuple_example = (1, 2, 3, "apple")
print(tuple_example[1])
```
- Use cases: Data that should not change, like geographic coordinates.

#### 6d. List and explain different control structures available in Python.
1. **Conditional Statements**: `if`, `elif`, `else` for decision-making.
   ```python
   x = 10
   if x > 5:
       print("x is greater than 5")
   else:
       print("x is less than or equal to 5")
   ```

2. **Loops**:
   - **For Loop**: Iterates over a sequence.
     ```python
     for i in range(3):
         print(i)
     ```
   - **While Loop**: Repeats as long as a condition is true.
     ```python
     count = 0
     while count < 3:
         print(count)
         count += 1
     ```

3. **Control Statements**:
   - `break`: Exits a loop.
   - `continue`: Skips the current iteration.

#### 6e. Briefly discuss the concept of skewness and kurtosis with examples.
- **Skewness** measures asymmetry in data distribution:
  - Positive skew: Tail is on the right.
  - Negative skew: Tail is on the left.

- **Kurtosis** measures the sharpness of the peak:
  - High kurtosis: Data has heavy tails.
  - Low kurtosis: Data has light tails.

Example:
```python
import scipy.stats as stats
import numpy as np

data = [1, 2, 2, 3, 4, 7, 9]
print("Skewness:", stats.skew(data))
print("Kurtosis:", stats.kurtosis(data))
```

#### 6f. Define mean, median, and mode with examples.
1. **Mean**: Average value.
   ```python
   data = [1, 2, 3, 4, 5]
   print("Mean:", sum(data) / len(data))
   ```
2. **Median**: Middle value when data is sorted.
   ```python
   data = [1, 3, 3, 6, 7, 8, 9]
   print("Median:", np.median(data))
   ```
3. **Mode**: Most frequent value.
   ```python
   from statistics import mode
   data = [1, 1, 2, 3, 4]
   print("Mode:", mode(data))
   ```

#### 6g. Explain inferential statistics using NumPy.
Inferential statistics involves drawing conclusions about a population based on sample data. Example:
```python
import numpy as np
sample = np.random.normal(50, 10, 100)
print("Sample Mean:", np.mean(sample))
```

#### 6h. Explain NumPy array slicing with a program in Python.
```python
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print("Slice from index 1 to 3:", arr[1:4])
```

#### 6i. Explain NOIR in detail.
**NOIR** stands for four types of data measurement scales:
1. **Nominal**: Categories without a specific order (e.g., colors).
2. **Ordinal**: Ordered categories (e.g., rankings).
3. **Interval**: Ordered and equidistant values without a true zero (e.g., temperature in Celsius).
4. **Ratio**: Ordered, equidistant values with a true zero (e.g., height, weight).

#### 6j. Why use R?
- Specialized in statistical analysis and visualization.
- Extensive libraries for data manipulation (e.g., dplyr) and plotting (e.g., ggplot2).
- Ideal for academic and research purposes.

#### 6k. Explain Matplotlib with a simple plot and bar plot.
```python
import matplotlib.pyplot as plt

# Simple Plot
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title("Simple Plot")
plt.show()

# Bar Plot
plt.bar(x, y)
plt.title



Section 1: Data and Python Basics
a. Explain the term data and data science.

Data: Information collected for reference or analysis. Examples include sales records, sensor readings, and survey responses.
Data Science: A multidisciplinary field that applies statistical, computational, and domain-specific knowledge to extract insights from data. It involves tools like Python, R, machine learning, and visualization techniques.
b. Python is considered superior as compared to other traditional programming languages for analysis, justify.

User-Friendly: Python’s syntax is simple, making it easier to learn.
Extensive Libraries: Includes libraries like NumPy, pandas, and scikit-learn for data manipulation and modeling.
Versatile: Used for web development, automation, and data analysis.
Community Support: Active and growing support base

In [None]:
Section 2: Lists, Dictionaries, and Loops
a. Explain the concept of list in Python.

A list is a collection of ordered, mutable items.
Example: fruits = ["apple", "banana", "cherry"]
b. Explain the concept of dictionary in Python.

A dictionary is an unordered, mutable collection of key-value pairs.
Example: person = {"name": "Alice", "age": 25}

In [None]:
Section 4: Data Types, Wrangling, and Visualization
a. Distinguish between discrete and continuous data.

Discrete Data: Countable values, e.g., number of students.
Continuous Data: Measurable values, e.g., height or weight.
Aspect	Discrete Data	Continuous Data
Nature	Countable	Measurable
Example	Number of cars	Temperature
Representation	Bar Chart	Line Chart/Histogram

### Brief Definitions for Essential Technology for Data Science

#### Section 1: Data and Python Basics

1. **Data**: Raw, unorganized facts or information that can be structured, semi-structured, or unstructured.
   
2. **Data Science**: A multidisciplinary field for extracting insights from data using methods like data preprocessing, modeling, and visualization.

3. **Python Advantages**: Python is popular for data analysis due to its simplicity, versatile libraries (like pandas and NumPy), and strong community support.

4. **Arithmetic Operators**: Basic operations include addition (`+`), subtraction (`-`), multiplication (`*`), division (`/`), modulus (`%`), exponentiation (`**`), and floor division (`//`).

5. **User Input in Python**: Use `input()` to get user data and process it in a program.

---

#### Section 2: Lists, Dictionaries, and Loops

1. **List**: An ordered, mutable collection of items (e.g., `[1, 2, 3]`).

2. **Dictionary**: A key-value pair data structure for fast lookups (e.g., `{"name": "Alice"}`).

3. **For Loop**: Iterates over sequences like lists or ranges (e.g., `for i in range(5): print(i)`).

4. **Divisibility Check**: Logic to test divisibility by 2 or 3 using the modulus operator (`%`).

---

#### Section 3: Mathematical and Statistical Concepts

1. **Permutations and Combinations**: Mathematical methods to count arrangements or group selections.

2. **Quartiles**: Divide data into four equal parts, with Q1, Q2 (median), and Q3 as critical points.

3. **Hypothesis Testing**: A statistical method to test assumptions about population parameters (e.g., t-tests).

4. **Array Dimensions**: Data structures can be 1D, 2D, or 3D, representing increasing levels of complexity.

---

#### Section 4: Data Types, Wrangling, and Visualization

1. **Discrete vs. Continuous Data**: Discrete data is countable, while continuous data is measurable.

2. **Data Wrangling**: Cleaning and preparing raw data for analysis using tools like pandas.

3. **Pie Charts**: Visualize data proportions with matplotlib.

4. **R Data Structures**: Includes vectors, data frames, and lists, similar to Python's lists and pandas DataFrames.

---

#### Section 5: File Handling, Distributions, and Functions

1. **File Importing in R**: Use `read.csv()` to load data into R, similar to pandas in Python.

2. **Distributions**: Symmetrical distributions have balanced data around the mean; asymmetrical distributions are skewed.

3. **Functions**: Built-in functions are pre-defined (e.g., `len()`), while user-defined functions are created for custom tasks.

4. **Python Casting**: Converting one data type to another (e.g., `int("123")` to convert a string to an integer).

---

#### Section 6: Advanced Concepts and Analysis

1. **Types of Analysis in Data Science**:
   - Descriptive: Summarizes data.
   - Diagnostic: Identifies causes.
   - Predictive: Forecasts outcomes.
   - Prescriptive: Suggests actions.

2. **Tuple**: Immutable ordered collections in Python (e.g., `(1, 2, 3)`).

3. **Control Structures**: Include conditional statements (`if`), loops (`for`, `while`), and exception handling.

4. **Skewness and Kurtosis**: Measure data distribution asymmetry and peakedness.

5. **Mean, Median, Mode**: Central tendency measures; mean (average), median (middle value), and mode (most frequent value).

6. **Inferential Statistics**: Draws conclusions about populations using data samples (e.g., via numpy).

7. **NumPy Array Slicing**: Extracts subarrays using syntax like `arr[start:stop]`.

8. **NOIR (Data Measurement Scales)**:
   - Nominal: Categories (e.g., colors).
   - Ordinal: Ordered categories (e.g., rankings).
   - Interval: Scaled data with equal intervals (e.g., temperature).
   - Ratio: Includes true zero (e.g., weight).

9. **Matplotlib**: Python library for creating visualizations, such as line and bar plots.

10. **ggplot2**: An R library for advanced data visualizations.



**Module I**

### Unit 1: Introduction to Data Science and Python

#### a) **Introduction to Data Science**
- **Definition**: Data Science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data.
- **Data Science Life Cycle**:
  1. Data Collection
  2. Data Preparation
  3. Data Analysis and Exploration
  4. Modeling and Evaluation
  5. Deployment
  6. Maintenance
- **Applications**:
  - Healthcare: Predictive analytics, personalized medicine
  - Finance: Fraud detection, risk management
  - Marketing: Customer segmentation, recommendation systems
  - E-commerce: Demand forecasting, supply chain optimization
- **Advantages of Python**:
  - Open-source and free
  - Extensive libraries (e.g., NumPy, Pandas, Matplotlib, Scikit-learn)
  - Easy-to-read syntax
  - Strong community support

#### b) **Python Basics**
- **What is Python?**
  - A high-level, interpreted programming language designed for readability and versatility.
- **Why Learn Python?**
  - Easy to learn, versatile, and widely used in Data Science.
- **Installing Python**:
  - Download from [python.org](https://www.python.org/).
  - Install IDEs like PyCharm, Jupyter Notebook, or VS Code.
- **Executing Python Programs**:
  - Run scripts using terminal or IDE.
- **Writing Your First Program**:
```python
print("Hello, World!")
```

#### c) **Basic Programming Elements of Python**
- **Variables and Constants**:
  - Variables: Named locations to store data (e.g., `x = 5`)
  - Constants: Immutable values (e.g., `PI = 3.14`)
- **Identifiers**:
  - Rules: Start with a letter or underscore, no spaces, case-sensitive.
- **Typecasting**:
  - Convert data types (e.g., `int("5")` to `5`)
- **Indentation**:
  - Essential for defining blocks of code.
- **Comments**:
  - Single-line: `# This is a comment`
  - Multi-line: `""" This is a multi-line comment """`
- **Primitive Data Types**:
  - `int`, `float`, `str`, `bool`
- **Command-Line Programs**:
```python
name = input("Enter your name: ")
print(f"Hello, {name}!")
```

#### d) **Operators in Python**
- **Arithmetic Operators**: `+`, `-`, `*`, `/`, `//`, `%`, `**`
- **Relational Operators**: `==`, `!=`, `<`, `>`, `<=`, `>=`
- **Logical Operators**: `and`, `or`, `not`
- **Membership Operators**: `in`, `not in`
- **User Input**:
```python
age = int(input("Enter your age: "))
print(f"You are {age} years old.")
```

### Unit 2: Data Structures and Control Flow

#### a) **Collection Data Structures**
- **Lists**: Mutable, ordered collections.
```python
my_list = [1, 2, 3]
```
- **Tuples**: Immutable, ordered collections.
```python
my_tuple = (1, 2, 3)
```
- **Dictionaries**: Key-value pairs.
```python
my_dict = {"key": "value"}
```
- **Sets**: Unordered, unique elements.
```python
my_set = {1, 2, 3}
```
- **Strings**: Immutable sequences of characters.
```python
my_string = "Hello"
```

#### b) **Control Flow**
- **Types**:
  - Sequential
  - Branching: `if`, `elif`, `else`
  - Iteration: `for`, `while`
  - Modular: Functions
- **Examples**:
```python
# Conditional
if x > 10:
    print("x is greater than 10")

# Loop
for i in range(5):
    print(i)
```

#### c) **User-defined Functions**
- **No Value Pass and No Return**:
```python
def greet():
    print("Hello")
```
- **Value Pass and No Return**:
```python
def greet(name):
    print(f"Hello, {name}")
```
- **Value Pass and Return**:
```python
def add(a, b):
    return a + b
```
- **Default Arguments**:
```python
def greet(name="Guest"):
    print(f"Hello, {name}")
```
- **Variable Arguments**:
```python
def sum_all(*args):
    return sum(args)
```
- **Higher-order Functions**:
```python
nums = [1, 2, 3]
squared = map(lambda x: x**2, nums)
```
- **List Comprehension**:
```python
squared = [x**2 for x in range(10)]
```

**Module II**

### Unit 3: Statistics for Data Analysts

#### a) **Basic Concepts**
- **Permutations and Combinations**:
  - Permutations: Order matters.
  - Combinations: Order does not matter.
- **Probability**:
  - Basic probability rules, conditional probability.
- **Descriptive Statistics**:
  - Mean, Median, Mode
  - Point Estimation: Estimating population parameters.
  - Quartiles and Boxplot: Visualizing data spread.
  - Methods of Dispersion: Variance, Standard Deviation, Range.
- **Random Variables and Probability Distribution**:
  - Discrete and Continuous distributions.

#### b) **Measures of Shape**
- **Skewness**: Measure of data asymmetry.
- **Kurtosis**: Measure of data tails.
- **Outlier Detection**:
  - Z-score, IQR method.
- **Transformations**:
  - Log, Square root transformations.

#### c) **Inferential Statistics**
- **Sampling Techniques**: Random, Stratified, Cluster.
- **Hypothesis Testing**:
  - Null and Alternative Hypotheses.
  - p-value, significance levels.
- **Z-score Normalization**: Standardizing data.
- **Correlation and ANOVA**:
  - Analyzing relationships and group differences.

#### d) **Introduction to NumPy**
- **Creating Arrays**:
```python
import numpy as np
arr = np.array([1, 2, 3])
```
- **Indexing and Slicing**:
```python
print(arr[0:2])
```
- **Vectorization**: Element-wise operations.
- **Boolean Indexing**:
```python
print(arr[arr > 2])
```
- **Statistical Functions**:
  - Mean, Median, Standard Deviation using NumPy.

### Unit 4: Data Wrangling Using Pandas

#### a) **Introduction to Data**
- **NOIR**:
  - Nominal, Ordinal, Interval, Ratio scales.
- **Types of Data Analysis**:
  - Descriptive, Diagnostic, Predictive, Prescriptive.
- **Continuous vs Discrete Data**:
  - Continuous: Infinite values within a range.
  - Discrete: Specific countable values.

#### b) **Data Wrangling Using Pandas**
- **Creating Series and DataFrames**:
```python
import pandas as pd
series = pd.Series([1, 2, 3])
data = {"A": [1, 2], "B": [3, 4]}
df = pd.DataFrame(data)
```
- **DataFrame Attributes**:
  - `.shape`, `.columns`, `.info()`
- **Operations**:
  - Add/Drop Columns and Rows.
  - Indexing with `.iloc` and `.loc`.
  - Conditional Selection.
  - Groupby and Summary Operations.
  - Sorting Data.

#### c) **Introduction to R IDE**
- **Components**: Console, Script Editor, Environment, Plots.
- **Basic Data Types and Structures**:
  - Vectors, Matrices, Data Frames, Lists.
  - Data Coercion.
- **File Importing and Visualization**:
  - Use `read.csv()` for importing files.
  - Create visualizations with `ggplot2`.

#### d) **Basic Visualization Using Matplotlib**
- **Chart Components**:
  - Title, Axis Labels, Legends.
- **Charts**:
  - Line Chart, Scatter Plot, Pie Chart.
- **Subplots**:
```python
import matplotlib.pyplot as plt
plt.subplot(1, 2, 1)
plt.plot([1, 2, 3])
plt.subplot(1, 2, 2)
plt.scatter([1, 2, 3], [4, 5, 6])
plt.show()
```

