# TechFlow Data Analysis - Module 1
## Learn Pandas by Doing

**Your Role:** Data Analyst at TechFlow (B2B SaaS Company)

**Your Task:** Answer business questions using Python + Pandas

---

# SETUP - Run these first

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv('/Users/prithambhosale/Downloads/Techflow/dataset/TechFlow.csv')

---
# PART 1: Quick Look at Data

**First 5 customers**

`head()` returns the first 5 rows by default. Great for a quick preview of your data.

```python
df.head()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**First 10 customers**

Pass a number to `head(n)` to get more rows.

```python
df.head(10)
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Last 5 customers**

`tail()` works like `head()` but from the end. Useful to check if data loaded completely.

```python
df.tail()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 2: How Big is Our Data?

**Rows and columns count**

`shape` returns a tuple: (rows, columns). No parentheses needed—it's a property, not a method.

```python
df.shape
```
Output: `(rows, columns)`

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Row count only**

Access the first value with `[0]` to get just the row count.

```python
df.shape[0]
```
`[0]` = first value = rows

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Column count only**

Access `[1]` for the column count.

```python
df.shape[1]
```
`[1]` = second value = columns

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Column names**

`columns` lists every column in your DataFrame. Helpful to see what data is available.

```python
df.columns
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 3: Get One Column

**Company names**

Use `df['ColumnName']` to select a single column. Returns a Series (one-dimensional data).

```python
df['CompanyName']
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Industries**

Same pattern—just change the column name inside the brackets.

```python
df['Industry']
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Monthly revenue column**

Numeric columns work the same way as text columns.

Numeric columns work the same way.

```python
df['MonthlyRevenue']
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**First 5 company names**

Chain `.head()` after selecting a column to limit results.

```python
df['CompanyName'].head()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 4: Get Multiple Columns

**Multiple columns**

Use double brackets `[['col1', 'col2']]` to select multiple columns. Returns a DataFrame.

```python
df[['CompanyName', 'MonthlyRevenue']]
```
Note: Double brackets `[[  ]]` for multiple columns

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Three columns**

Add more column names to the list inside double brackets.

Add more column names to the list.

```python
df[['CompanyName', 'Industry', 'SeatCount']]
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Multiple columns + head**

Chain `.head()` to preview a subset.

```python
df[['CompanyName', 'MonthlyRevenue']].head()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 5: Get Specific Rows

**First row by position**

`iloc[n]` selects row by integer position. Python starts counting at 0.

```python
df.iloc[0]
```
`iloc` = integer location

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Fifth row**

Row index 4 is the 5th customer. Python counts from 0: (0, 1, 2, 3, 4).

Row 4 is the 5th customer (0, 1, 2, 3, 4).

```python
df.iloc[4]
```
Remember: Python starts at 0

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Range of rows**

`iloc[5:10]` gets rows 5-9. The end value (10) is excluded.

```python
df.iloc[5:10]
```
`5:10` means rows 5,6,7,8,9 (10 is excluded)

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**First 3 rows**

`:3` means "from start up to (but not including) 3".

```python
df.iloc[:3]
```
`:3` means from start to row 2

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 6: Get Specific Cell

**Specific cell by position**

`iloc[row, col]` gets one cell. CompanyName is column index 1.

```python
df.iloc[3, 1]
```
`[row, column]` - CompanyName is column 1

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Specific cell by label**

`loc[row, 'ColumnName']` uses the column name instead of number. More readable.

```python
df.loc[3, 'CompanyName']
```
`loc` uses labels, `iloc` uses numbers

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Another cell example**

`loc[row, 'ColumnName']` accesses any cell by row number and column name.

Same pattern with a different column.

```python
df.loc[0, 'MonthlyRevenue']
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 7: Basic Stats

**Sum of a column**

`.sum()` adds up all values in the column.

```python
df['MonthlyRevenue'].sum()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Average (mean)**

`.mean()` calculates the average.

```python
df['MonthlyRevenue'].mean()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Maximum value**

`.max()` finds the largest value.

```python
df['MonthlyRevenue'].max()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Minimum value**

`.min()` finds the smallest value.

```python
df['MonthlyRevenue'].min()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Average seats**

Same `.mean()` on a different column.

```python
df['SeatCount'].mean()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Total seats**

`.sum()` works on any numeric column.

```python
df['SeatCount'].sum()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 8: Counting Values

**Count unique values**

`.nunique()` counts distinct values. Useful for categorical columns.

```python
df['Industry'].nunique()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Value frequency**

`.value_counts()` shows how many times each value appears.

```python
df['Industry'].value_counts()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Plan distribution**

`value_counts()` on a different column shows the breakdown.

Same method on a different column.

```python
df['SubscriptionPlan'].value_counts()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Support tier breakdown**

Repeat the same method for any categorical column.

Repeat for any categorical column.

```python
df['SupportTier'].value_counts()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 9: Describe Data

**Quick statistics**

`.describe()` gives count, mean, std, min, 25%, 50%, 75%, max.

```python
df['MonthlyRevenue'].describe()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Seat statistics**

`describe()` gives count, mean, std, min, quartiles, and max.

Numeric columns get statistical summary.

```python
df['SeatCount'].describe()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Text column stats**

For text columns, you get count, unique values, most common (top), and its frequency.

For text columns: count, unique, top, freq.

```python
df['Industry'].describe()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 10: Filter Data (Boolean)

**Create a filter**

Comparison operators (`==`, `>`, `<`) create a True/False Series for each row.

Comparison creates True/False for each row. This is a boolean Series.

Step 1 - Create a filter:
```python
df['Cancelled'] == 1
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


Step 2 - Save the filter:
```python
is_cancelled = df['Cancelled'] == 1
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


Step 3 - Apply the filter:
```python
df[is_cancelled]
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Customers cancelled?**

`sum()` on a boolean Series counts True values. True=1, False=0.

```python
is_cancelled.sum()
```
True=1, False=0, so sum counts True

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 11: More Filters

**Enterprise customers**

Put the filter inside `df[...]` to get only the rows where the condition is True.

```python
df[df['SubscriptionPlan'] == 'Enterprise']
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Technology industry**

Same pattern—filter by any column value.

```python
df[df['Industry'] == 'Technology']
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Filter by number**

Use `>`, `<`, `>=`, `<=`, `!=` for numeric comparisons.

```python
df[df['MonthlyRevenue'] > 500]
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Another numeric filter**

Use `>`, `<`, `>=`, `<=`, `!=` for numeric comparisons.

Same pattern with a different threshold.

```python
df[df['SeatCount'] > 50]
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Count filtered rows**

Wrap filter in parentheses, then `.sum()` to count matches.

```python
(df['MonthlyRevenue'] > 500).sum()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


---
# PART 12: Combine Filter + Select

**Filter + select column**

Filter first with `df[condition]`, then select a column with `['ColumnName']`.

Filter first, then select a column from the result.

```python
df[df['Cancelled'] == 1]['CompanyName']
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Filter + different column**

Same technique—get one column from filtered data.

Get revenue only for Enterprise customers.

```python
df[df['SubscriptionPlan'] == 'Enterprise']['MonthlyRevenue']
```

In [None]:
# ↓ Type the code above, then press Shift+Enter to run


**Filter + calculate**

Chain `.mean()` to get the average of filtered data.

```python
df[df['SubscriptionPlan'] == 'Enterprise']['MonthlyRevenue'].mean()
```

In [None]:
# ↓ Type the code below, then press Shift+Enter to run


**Filter + sum**

Combine filter with `.sum()` for totals by category.

```python
df[df['Industry'] == 'Technology']['MonthlyRevenue'].sum()
```

In [None]:
# ↓ Type the code above, then press Shift+Enter to run


---
# PRACTICE: Answer These Business Questions

### Q1: How many customers are on the Basic plan?

In [None]:
# Your answer:

### Q2: What is the average tenure (TenureMonths) of our customers?

In [None]:
# Your answer:


### Q3: Show the names of customers in the Healthcare industry

In [None]:
# Your answer:


### Q4: What is the total revenue from Gold support tier customers?

In [None]:
# Your answer:


### Q5: How many customers have NPS_Score >= 9?

In [None]:
# Your answer:


### Q6: What is the maximum number of seats any customer has?

In [None]:
# Your answer:


### Q7: Show first 5 rows of CompanyName, Industry, and MonthlyRevenue

In [None]:
# Your answer:


### Q8: How many customers are NOT cancelled (Cancelled == 0)?

In [None]:
# Your answer:


---
# CHEAT SHEET

| What you want | Code |
|---------------|------|
| First 5 rows | `df.head()` |
| Last 5 rows | `df.tail()` |
| Row/col count | `df.shape` |
| Column names | `df.columns` |
| One column | `df['col']` |
| Multiple columns | `df[['col1','col2']]` |
| One row | `df.iloc[0]` |
| Row range | `df.iloc[5:10]` |
| One cell | `df.loc[0, 'col']` |
| Sum | `df['col'].sum()` |
| Average | `df['col'].mean()` |
| Max/Min | `df['col'].max()` |
| Count unique | `df['col'].nunique()` |
| Value counts | `df['col'].value_counts()` |
| Stats | `df['col'].describe()` |
| Filter | `df[df['col'] == 'value']` |
| Filter + count | `(df['col'] == 'value').sum()` |

---
## Module 1 Complete!

You now know how to:
- View data (head, tail, shape)
- Select columns and rows
- Calculate basic stats (sum, mean, max, min)
- Count and describe values
- Filter data

Next: Data Exploration and Visualization