In [2]:
import nbformat as nbf

# Create a new notebook object
nb = nbf.v4.new_notebook()

# Define the text content with mixed markdown and code
text_content = [
    ("markdown", "# Pandas Series: Overview\n1. A Pandas Series is a one-dimensional labeled array capable of holding any data type, including integers, floats, strings, Python objects, and more.\n2. It can be thought of as a single column in a spreadsheet or a single column of a DataFrame.\n3. Each element in a Series has an associated label or index, which allows for fast and easy data retrieval."),
    
    ("markdown", "## Key Features of a Pandas Series\n1. **Homogeneous Data**: All elements in a Series are of the same data type.\n2. **Labeled Index**: Each element is associated with an index, which is a label that allows you to access elements by their index rather than their position.\n3. **Automatic Alignment**: Series objects align automatically based on their index when performing operations.\n4. **Flexible Indexing**: Series supports both integer and label-based indexing."),
    
    ("markdown", "## Creating a Pandas Series\nThere are several ways to create a Pandas Series:"),
    
    ("markdown", "### 1. From a Python List:"),
    ("code", """import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)
"""),
    ("markdown", "Output:\n```\n0    10\n1    20\n2    30\n3    40\ndtype: int64\n```"),
    
    ("markdown", "### 2. From a Dictionary:"),
    ("code", """data = {'a': 10, 'b': 20, 'c': 30, 'd': 40}
s = pd.Series(data)
print(s)
"""),
    ("markdown", "Output:\n```\na    10\nb    20\nc    30\nd    40\ndtype: int64\n```"),
    
    ("markdown", "### 3. From a Scalar Value:\nYou can create a Series with the same value repeated across a specified index."),
    ("code", """s = pd.Series(5, index=['a', 'b', 'c', 'd'])
print(s)
"""),
    ("markdown", "Output:\n```\na    5\nb    5\nc    5\nd    5\ndtype: int64\n```"),
    
    ("markdown", "### 4. Custom Indexing:\nYou can provide a custom index while creating a Series."),
    ("code", """data = [10, 20, 30, 40]
s = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(s)
"""),
    ("markdown", "Output:\n```\na    10\nb    20\nc    30\nd    40\ndtype: int64\n```"),
    
    ("markdown", "## Accessing Data in a Series"),
    
    ("markdown", "### 1. By Index Label:\n**Accessing a single element:**"),
    ("code", """print(s['b'])
"""),
    ("markdown", "Output:\n```\n20\n```"),
    
    ("markdown", "### Accessing multiple elements:"),
    ("code", """print(s[['a', 'c']])
"""),
    ("markdown", "Output:\n```\na    10\nc    30\ndtype: int64\n```"),
    
    ("markdown", "### 2. By Integer Location:\n**Accessing a single element:**"),
    ("code", """print(s[1])
"""),
    ("markdown", "Output:\n```\n20\n```"),
    
    ("markdown", "### Accessing a range of elements:"),
    ("code", """print(s[1:3])
"""),
    ("markdown", "Output:\n```\nb    20\nc    30\ndtype: int64\n```"),
    
    ("markdown", "## Vectorized Operations on Series\nOne of the powerful features of Pandas is the ability to perform vectorized operations on Series, which means applying an operation to each element in the Series without the need for explicit loops."),
    
    ("markdown", "### 1. Arithmetic Operations:"),
    ("code", """s = pd.Series([1, 2, 3, 4])
print(s + 10)
"""),
    ("markdown", "Output:\n```\n0    11\n1    12\n2    13\n3    14\ndtype: int64\n```"),
    
    ("markdown", "### 2. Element-wise Operations:"),
    ("code", """s = pd.Series([1, 2, 3, 4])
print(s * s)
"""),
    ("markdown", "Output:\n```\n0     1\n1     4\n2     9\n3    16\ndtype: int64\n```"),
    
    ("markdown", "### 3. Using NumPy Functions:"),
    ("code", """import numpy as np
s = pd.Series([1, 2, 3, 4])
print(np.exp(s))
"""),
    ("markdown", "Output:\n```\n0     2.718282\n1     7.389056\n2    20.085537\n3    54.598150\ndtype: float64\n```"),
    
    ("markdown", "## Handling Missing Data in Series\nPandas Series has built-in support for handling missing data using NaN (Not a Number)."),
    
    ("markdown", "### 1. Creating a Series with Missing Data:"),
    ("code", """s = pd.Series([1, 2, np.nan, 4])
print(s)
"""),
    ("markdown", "Output:\n```\n0    1.0\n1    2.0\n2    NaN\n3    4.0\ndtype: float64\n```"),
    
    ("markdown", "### 2. Detecting Missing Data:"),
    ("code", """print(s.isnull())
"""),
    ("markdown", "Output:\n```\n0    False\n1    False\n2     True\n3    False\ndtype: bool\n```"),
    
    ("markdown", "### 3. Filling Missing Data:"),
    ("code", """print(s.fillna(0))
"""),
    ("markdown", "Output:\n```\n0    1.0\n1    2.0\n2    0.0\n3    4.0\ndtype: float64\n```"),
    
    ("markdown", "### 4. Dropping Missing Data:"),
    ("code", """print(s.dropna())
"""),
    ("markdown", "Output:\n```\n0    1.0\n1    2.0\n3    4.0\ndtype: float64\n```"),
    
    ("markdown", "## Example: Creating and Manipulating a Series\nLet’s consider a practical example where we create a Series of exam scores and perform some operations."),
    
    ("code", """import pandas as pd

# Create a Series with custom index
scores = pd.Series([85, 90, 78, 92, 88], index=['Alice', 'Bob', 'Charlie', 'David', 'Eva'])

# Display the Series
print("Scores:")
print(scores)

# Accessing a single score
print("\\nBob's score:", scores['Bob'])

# Calculating the average score
average_score = scores.mean()
print("\\nAverage score:", average_score)

# Adding 5 bonus points to all scores
bonus_scores = scores + 5
print("\\nScores after adding bonus points:")
print(bonus_scores)

# Identify students who scored above 90 after the bonus
high_scorers = bonus_scores[bonus_scores > 90]
print("\\nStudents scoring above 90 after bonus:")
print(high_scorers)
"""),
    ("markdown", "Output:\n```\nScores:\nAlice      85\nBob        90\nCharlie    78\nDavid      92\nEva        88\ndtype: int64\n\nBob's score: 90\n\nAverage score: 86.6\n\nScores after adding bonus points:\nAlice      90\nBob        95\nCharlie    83\nDavid      97\nEva        93\ndtype: int64\n\nStudents scoring above 90 after bonus:\nBob       95\nDavid     97\nEva       93\ndtype: int64\n```")
]

# Convert each section into markdown or code cells
cells = []
for cell_type, content in text_content:
    if cell_type == "code":
        cells.append(nbf.v4.new_code_cell(content.strip()))
    else:
        cells.append(nbf.v4.new_markdown_cell(content))

# Assign the cells to the notebook
nb['cells'] = cells

# Write the notebook to a file with UTF-8 encoding
with open('series.ipynb', 'w', encoding='utf-8') as f:
    nbf.write(nb, f)


In [2]:
import nbformat as nbf

# Create a new notebook
nb = nbf.v4.new_notebook()

# Define the content of the notebook in Markdown and code cells
cells = [
    # Title and Introduction
    nbf.v4.new_markdown_cell("# pandas.DataFrame.groupby\n"
                             "The pandas.DataFrame.groupby function is one of the most powerful features in pandas, used for grouping data based on one or more keys and then applying some aggregation, transformation, or other operations on those groups."
    ),
    
    # Basic Concept
    nbf.v4.new_markdown_cell("## 1. Basic Concept\n"
                             "DataFrame.groupby is similar to the SQL GROUP BY clause. It allows you to split the DataFrame into groups based on some criteria, perform operations on each group, and then combine the results back into a single DataFrame or Series."
    ),
    
    # Syntax
    nbf.v4.new_markdown_cell("## 2. Syntax\n\n"
                             "```python\n"
                             "DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)\n"
                             "```\n"
    ),
    
    # Steps in Using groupby
    nbf.v4.new_markdown_cell("## 3. Steps in Using groupby\n"
                             "1. **Splitting**: The data is split into groups based on the values of the specified key(s).\n"
                             "2. **Applying**: An operation is applied to each group independently. These operations can include:\n"
                             "   - Aggregation: e.g., sum, mean, count, etc.\n"
                             "   - Transformation: e.g., standardizing data within groups.\n"
                             "   - Filtration: e.g., filtering groups based on a condition.\n"
                             "3. **Combining**: The results are combined back into a DataFrame or Series."
    ),
    
    # Common Aggregations
    nbf.v4.new_markdown_cell("## 4. Common Aggregations\n"
                             "Here are some common aggregation functions you can use after grouping:"
    ),
    
    # Example 1
    nbf.v4.new_markdown_cell("## 5. Examples\n\n"
                             "### Example 1: Basic Grouping and Aggregation"
    ),
    nbf.v4.new_code_cell("import pandas as pd\n\n"
                         "# Sample DataFrame\n"
                         "data = {\n"
                         "    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],\n"
                         "    'Values': [10, 20, 15, 25, 10, 30]\n"
                         "}\n"
                         "df = pd.DataFrame(data)\n\n"
                         "# Group by 'Category' and sum the 'Values'\n"
                         "grouped = df.groupby('Category').sum()\n\n"
                         "print(grouped)\n"
    ),
    nbf.v4.new_markdown_cell("• Explanation: The DataFrame is grouped by the Category column, and the sum() function is applied to the Values column. This results in a new DataFrame where the Values for each Category are summed."
    ),
    
    # Example 2
    nbf.v4.new_markdown_cell("### Example 2: Grouping by Multiple Columns"
    ),
    nbf.v4.new_code_cell("data = {\n"
                         "    'Category': ['A', 'A', 'B', 'B', 'C', 'C'],\n"
                         "    'Sub-Category': ['X', 'Y', 'X', 'Y', 'X', 'Y'],\n"
                         "    'Values': [1, 2, 3, 4, 5, 6]\n"
                         "}\n"
                         "df = pd.DataFrame(data)\n\n"
                         "# Group by 'Category' and 'Sub-Category', then calculate the mean of 'Values'\n"
                         "grouped = df.groupby(['Category', 'Sub-Category']).mean()\n\n"
                         "print(grouped)\n"
    ),
    nbf.v4.new_markdown_cell("• Explanation: The DataFrame is grouped by both Category and Sub-Category. The mean() function is applied to the Values column, giving the mean for each combination of Category and Sub-Category."
    ),
    
    # Example 3
    nbf.v4.new_markdown_cell("### Example 3: Grouping and Applying Multiple Aggregations"
    ),
    nbf.v4.new_code_cell("data = {\n"
                         "    'Category': ['A', 'A', 'B', 'B', 'A', 'B'],\n"
                         "    'Values': [10, 20, 15, 25, 10, 30]\n"
                         "}\n"
                         "df = pd.DataFrame(data)\n\n"
                         "# Group by 'Category' and apply multiple aggregation functions\n"
                         "grouped = df.groupby('Category')['Values'].agg(['sum', 'mean', 'count'])\n\n"
                         "print(grouped)\n"
    ),
    nbf.v4.new_markdown_cell("• Explanation: The DataFrame is grouped by the Category column, and multiple aggregation functions (sum, mean, and count) are applied to the Values column. The result is a new DataFrame with the specified aggregations."
    ),
    
    # Example 4
    nbf.v4.new_markdown_cell("### Example 4: Grouping and Filtering"
    ),
    nbf.v4.new_code_cell("data = {\n"
                         "    'Category': ['A', 'A', 'B', 'B', 'C', 'C'],\n"
                         "    'Values': [10, 20, 10, 30, 50, 60]\n"
                         "}\n"
                         "df = pd.DataFrame(data)\n\n"
                         "# Group by 'Category' and filter groups where the sum of 'Values' > 50\n"
                         "filtered = df.groupby('Category').filter(lambda x: x['Values'].sum() > 50)\n\n"
                         "print(filtered)\n"
    ),
    nbf.v4.new_markdown_cell("• Explanation: The DataFrame is grouped by the Category column. The filter() function keeps only those groups where the sum of Values is greater than 50. In this case, only category 'C' meets the condition."
    ),
    
    # Advanced Usage
    nbf.v4.new_markdown_cell("## 6. Advanced Usage\n\n"
                             "### Example 5: Grouping and Transforming Data"
    ),
    nbf.v4.new_code_cell("data = {\n"
                         "    'Category': ['A', 'A', 'B', 'B', 'A', 'B'],\n"
                         "    'Values': [10, 20, 15, 25, 10, 30]\n"
                         "}\n"
                         "df = pd.DataFrame(data)\n\n"
                         "# Group by 'Category' and subtract the mean of 'Values' within each group\n"
                         "df['Adjusted Values'] = df.groupby('Category')['Values'].transform(lambda x: x - x.mean())\n\n"
                         "print(df)\n"
    ),
    nbf.v4.new_markdown_cell("• Explanation: The DataFrame is grouped by Category, and the mean of Values within each group is subtracted from each value in that group. This operation is done using transform(), which returns a Series with the same shape as the original data."
    ),
    
    # Summary
    nbf.v4.new_markdown_cell("### Summary\n"
                             "• groupby() allows you to split your data into groups based on some criteria, perform operations on each group, and then combine the results.\n"
                             "• Aggregation functions like sum(), mean(), count(), etc., are commonly used with groupby.\n"
                             "• You can group by multiple columns, apply multiple aggregation functions, filter groups, or even transform data within groups.\n"
                             "• groupby() is essential for data analysis tasks where you need to summarize or manipulate data based on specific groupings."
    )
]

# Add cells to the notebook
nb.cells.extend(cells)

# Save the notebook to a file
with open('groupby.ipynb', 'w', encoding='utf-8') as f:
    nbf.write(nb, f)

print("Notebook 'pandas_groupby_example.ipynb' created successfully.")


Notebook 'pandas_groupby_example.ipynb' created successfully.


In [3]:
import nbformat as nbf

# Create a new notebook
nb = nbf.v4.new_notebook()

# Define the content of the notebook in Markdown and code cells
cells = [
    # Title and Introduction
    nbf.v4.new_markdown_cell("# pandas.DataFrame.map\n"
                             "The pandas.DataFrame.map function is a powerful tool for element-wise operations on a Series (not directly on a DataFrame, though similar operations can be done on DataFrame columns). It is used to map or substitute each value in a Series using a mapping correspondence (a dictionary, a function, or a Series)."
    ),
    
    # Basic Concept
    nbf.v4.new_markdown_cell("## 1. Basic Concept\n"
                             "• **Series.map** is used for mapping values in a Series from one set of values to another based on a provided mapping.\n"
                             "• This is often used for tasks like replacing values, applying functions to each element, or using a dictionary to transform the data."
    ),
    
    # Syntax
    nbf.v4.new_markdown_cell("## 2. Syntax\n\n"
                             "```python\n"
                             "Series.map(arg, na_action=None)\n"
                             "```\n"
                             "• **arg**: This can be a function, dictionary, or Series. It defines the mapping correspondence.\n"
                             "  - Function: A function to apply to each element of the Series.\n"
                             "  - Dictionary/Series: A mapping of values to new values.\n"
                             "• **na_action**: This can be either None or 'ignore'. If set to 'ignore', it leaves NaN values as NaN."
    ),
    
    # Examples
    nbf.v4.new_markdown_cell("## 3. Examples\n\n"
                             "### Example 1: Mapping Using a Dictionary"
    ),
    nbf.v4.new_code_cell("import pandas as pd\n\n"
                         "# Sample DataFrame\n"
                         "data = {\n"
                         "    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],\n"
                         "    'Temperature': [75, 85, 60, 90, 100]\n"
                         "}\n"
                         "df = pd.DataFrame(data)\n\n"
                         "# Mapping cities to states using a dictionary\n"
                         "city_to_state = {\n"
                         "    'New York': 'NY',\n"
                         "    'Los Angeles': 'CA',\n"
                         "    'Chicago': 'IL',\n"
                         "    'Houston': 'TX',\n"
                         "    'Phoenix': 'AZ'\n"
                         "}\n\n"
                         "# Apply map function to 'City' column\n"
                         "df['State'] = df['City'].map(city_to_state)\n\n"
                         "print(df)\n"
    ),
    nbf.v4.new_markdown_cell("• **Explanation**:\n"
                             "  - The `map()` function is applied to the 'City' column, using the `city_to_state` dictionary to map each city to its corresponding state.\n"
                             "  - A new column 'State' is created in the DataFrame containing the mapped state values."
    ),
    
    # Example 2
    nbf.v4.new_markdown_cell("### Example 2: Applying a Function to Each Element"
    ),
    nbf.v4.new_code_cell("# Apply a lambda function to convert Fahrenheit to Celsius\n"
                         "df['Temperature_Celsius'] = df['Temperature'].map(lambda x: (x - 32) * 5.0/9.0)\n\n"
                         "print(df)\n"
    ),
    nbf.v4.new_markdown_cell("• **Explanation**:\n"
                             "  - The `map()` function is used here to apply a lambda function to each value in the 'Temperature' column.\n"
                             "  - The lambda function converts the temperature from Fahrenheit to Celsius, and the result is stored in a new column 'Temperature_Celsius'."
    ),
    
    # Example 3
    nbf.v4.new_markdown_cell("### Example 3: Handling Missing Mappings"
    ),
    nbf.v4.new_code_cell("# Adding a new city that is not in the mapping dictionary\n"
                         "df.loc[len(df.index)] = ['San Francisco', 65, None, None]\n\n"
                         "# Mapping cities to states with NaN handling\n"
                         "df['State'] = df['City'].map(city_to_state)\n\n"
                         "print(df)\n"
    ),
    nbf.v4.new_markdown_cell("• **Explanation**:\n"
                             "  - A new city 'San Francisco' is added to the DataFrame, but it is not included in the `city_to_state` dictionary.\n"
                             "  - When `map()` is called, it cannot find a match for 'San Francisco', so it assigns NaN to the 'State' column for that row.\n"
                             "  - `map()` naturally handles missing mappings by assigning NaN to unmatched values."
    ),
    
    # Example 4
    nbf.v4.new_markdown_cell("### Example 4: Mapping with a Series"
    ),
    nbf.v4.new_code_cell("# Creating a Series for mapping\n"
                         "state_population = pd.Series({\n"
                         "    'NY': 19.45,\n"
                         "    'CA': 39.51,\n"
                         "    'IL': 12.67,\n"
                         "    'TX': 28.7,\n"
                         "    'AZ': 7.28\n"
                         "}, name='Population_Millions')\n\n"
                         "# Map states to their populations\n"
                         "df['Population_Millions'] = df['State'].map(state_population)\n\n"
                         "print(df)\n"
    ),
    nbf.v4.new_markdown_cell("• **Explanation**:\n"
                             "  - A Series `state_population` is created, where the index represents the state abbreviations, and the values represent the population in millions.\n"
                             "  - The `map()` function maps the state abbreviations in the 'State' column to their corresponding population values.\n"
                             "  - The resulting population values are stored in a new column 'Population_Millions'."
    ),
    
    # Summary
    nbf.v4.new_markdown_cell("### Summary\n"
                             "• `map()` is mainly used with pandas Series to perform element-wise transformations based on a dictionary, Series, or function.\n"
                             "• It is commonly used to replace values, apply functions, and map values based on another Series or dictionary.\n"
                             "• **Handling missing values**: If a value in the Series is not found in the mapping, NaN is returned for that element.\n"
                             "• **Not applicable directly to DataFrames**: `map()` works on Series. For DataFrames, it can be applied to individual columns.\n\n"
                             "The versatility of `map()` makes it a valuable tool in data manipulation and transformation tasks."
    )
]

# Add cells to the notebook
nb.cells.extend(cells)

# Save the notebook to a file
with open('pandas_map.ipynb', 'w',encoding='utf-8') as f:
    nbf.write(nb, f)

print("Notebook 'pandas_map.ipynb' created successfully.")


Notebook 'pandas_map.ipynb' created successfully.


In [4]:
import nbformat as nbf

# Create a new notebook
nb = nbf.v4.new_notebook()

# Define the content of the notebook in Markdown and code cells
cells = [
    # Title and Introduction
    nbf.v4.new_markdown_cell("# Creating DataFrames and Merging in pandas\n"
                             "In this notebook, we'll create two DataFrames and practice merging them using the `merge()` function in pandas. We'll cover various types of joins, including inner, left, right, and outer joins."
    ),
    
    # Create DataFrames
    nbf.v4.new_markdown_cell("## Creating DataFrames\n\n"
                             "We'll start by creating two DataFrames: one for employees and one for departments."
    ),
    nbf.v4.new_code_cell("import pandas as pd\n\n"
                         "# Dataset 1: Employees\n"
                         "employees = pd.DataFrame({\n"
                         "    'EmployeeID': [1, 2, 3, 4, 5],\n"
                         "    'Name': ['John Doe', 'Jane Smith', 'Mike Brown', 'Emily Davis', 'Anna White'],\n"
                         "    'DepartmentID': [101, 102, 101, 103, 104],\n"
                         "    'Salary': [50000, 60000, 45000, 70000, 48000]\n"
                         "})\n\n"
                         "# Dataset 2: Departments\n"
                         "departments = pd.DataFrame({\n"
                         "    'DepartmentID': [101, 102, 103, 105],\n"
                         "    'DepartmentName': ['HR', 'IT', 'Marketing', 'Sales']\n"
                         "})\n\n"
                         "# Display the DataFrames\n"
                         "print(\"Employees DataFrame:\")\n"
                         "print(employees)\n"
                         "print(\"\\nDepartments DataFrame:\")\n"
                         "print(departments)\n"
    ),
    
    # Merge Operations
    nbf.v4.new_markdown_cell("## Example Merge Operations\n\n"
                             "We can perform various merge operations using the `merge()` function. Here are some common types of joins:"
    ),
    
    # Inner Join
    nbf.v4.new_markdown_cell("### Inner Join\n"
                             "Merging on `DepartmentID` to get only employees who have a matching department."
    ),
    nbf.v4.new_code_cell("# Inner join\n"
                         "merged_inner = pd.merge(employees, departments, on='DepartmentID', how='inner')\n\n"
                         "print(\"\\nInner Join Result:\")\n"
                         "print(merged_inner)\n"
    ),
    
    # Left Join
    nbf.v4.new_markdown_cell("### Left Join\n"
                             "Keeping all employees and adding department information where available."
    ),
    nbf.v4.new_code_cell("# Left join\n"
                         "merged_left = pd.merge(employees, departments, on='DepartmentID', how='left')\n\n"
                         "print(\"\\nLeft Join Result:\")\n"
                         "print(merged_left)\n"
    ),
    
    # Right Join
    nbf.v4.new_markdown_cell("### Right Join\n"
                             "Keeping all departments and adding employee information where available."
    ),
    nbf.v4.new_code_cell("# Right join\n"
                         "merged_right = pd.merge(employees, departments, on='DepartmentID', how='right')\n\n"
                         "print(\"\\nRight Join Result:\")\n"
                         "print(merged_right)\n"
    ),
    
    # Outer Join
    nbf.v4.new_markdown_cell("### Outer Join\n"
                             "Combining all employees and departments, regardless of whether they have matching data."
    ),
    nbf.v4.new_code_cell("# Outer join\n"
                         "merged_outer = pd.merge(employees, departments, on='DepartmentID', how='outer')\n\n"
                         "print(\"\\nOuter Join Result:\")\n"
                         "print(merged_outer)\n"
    ),
    
    # Summary
    nbf.v4.new_markdown_cell("## Summary\n"
                             "In this notebook, we demonstrated how to create DataFrames and perform various merge operations using the `merge()` function in pandas. We covered:\n"
                             "- **Inner Join**: Merges only rows with matching keys in both DataFrames.\n"
                             "- **Left Join**: Keeps all rows from the left DataFrame and adds matching rows from the right DataFrame.\n"
                             "- **Right Join**: Keeps all rows from the right DataFrame and adds matching rows from the left DataFrame.\n"
                             "- **Outer Join**: Combines all rows from both DataFrames, with NaN for missing matches.\n\n"
                             "These operations are useful for combining datasets and performing relational data analysis."
    )
]

# Add cells to the notebook
nb.cells.extend(cells)

# Save the notebook to a file
with open('merge_example.ipynb', 'w',encoding='utf-8') as f:
    nbf.write(nb, f)

print("Notebook 'merge_example.ipynb' created successfully.")


Notebook 'merge_example.ipynb' created successfully.


In [5]:
import nbformat as nbf

# Create a new notebook
nb = nbf.v4.new_notebook()

# Define the content of the notebook in Markdown and code cells
cells = [
    # Title and Introduction
    nbf.v4.new_markdown_cell("# Using pandas.DataFrame.pivot\n"
                             "The `pandas.DataFrame.pivot` function is a powerful tool for reshaping DataFrames. It transforms data from a long format to a wide format, which is useful for data analysis and reporting.\n\n"
                             "In this notebook, we will explore how to use `pivot` to reorganize and summarize data. We will cover basic usage, handling missing data, reshaping data back, and using MultiIndex."
    ),
    
    # Basic Concept
    nbf.v4.new_markdown_cell("## Basic Concept\n\n"
                             "`pandas.DataFrame.pivot` rearranges the data in your DataFrame by converting unique values from one column into new columns and organizing the data according to a given index and values.\n\n"
                             "It is mainly used to reorganize and summarize data for better analysis."
    ),
    
    # Syntax
    nbf.v4.new_markdown_cell("## Syntax\n\n"
                             "`DataFrame.pivot(index=None, columns=None, values=None)`\n\n"
                             "- **index**: The column to use as the new DataFrame’s index. If None, uses the existing index.\n"
                             "- **columns**: The column whose unique values will become the columns in the pivoted DataFrame.\n"
                             "- **values**: The column to fill the new DataFrame's values. If None, all remaining columns are used."
    ),
    
    # Basic Example
    nbf.v4.new_markdown_cell("## Basic Example\n\n"
                             "Let's start with a simple example to understand how pivot works."
    ),
    nbf.v4.new_code_cell("import pandas as pd\n\n"
                         "# Create a simple DataFrame\n"
                         "data = {\n"
                         "    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],\n"
                         "    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles'],\n"
                         "    'Temperature': [32, 75, 30, 78],\n"
                         "    'Humidity': [80, 20, 85, 18]\n"
                         "}\n\n"
                         "df = pd.DataFrame(data)\n"
                         "print(\"Original DataFrame:\")\n"
                         "print(df)\n\n"
                         "# Pivot the DataFrame\n"
                         "pivot_df = df.pivot(index='Date', columns='City', values='Temperature')\n"
                         "print(\"\\nPivoted DataFrame:\")\n"
                         "print(pivot_df)\n"
    ),
    
    # Pivot with Multiple Values
    nbf.v4.new_markdown_cell("## Pivot with Multiple Values\n\n"
                             "You can also pivot multiple columns by specifying a list of column names in the `values` parameter."
    ),
    nbf.v4.new_code_cell("# Pivot with multiple values\n"
                         "pivot_df = df.pivot(index='Date', columns='City', values=['Temperature', 'Humidity'])\n"
                         "print(\"\\nPivoted DataFrame with multiple values:\")\n"
                         "print(pivot_df)\n"
    ),
    
    # Handling Missing Data
    nbf.v4.new_markdown_cell("## Handling Missing Data\n\n"
                             "If your data contains combinations of index and columns that do not exist in the original DataFrame, the resulting pivoted DataFrame will contain NaN for those missing values."
    ),
    nbf.v4.new_code_cell("# Add a row with a missing city\n"
                         "data_with_missing = {\n"
                         "    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03'],\n"
                         "    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York'],\n"
                         "    'Temperature': [32, 75, 30, 78, 28],\n"
                         "    'Humidity': [80, 20, 85, 18, 90]\n"
                         "}\n\n"
                         "df_missing = pd.DataFrame(data_with_missing)\n"
                         "print(\"\\nOriginal DataFrame with missing data:\")\n"
                         "print(df_missing)\n\n"
                         "# Pivot the DataFrame\n"
                         "pivot_df_missing = df_missing.pivot(index='Date', columns='City', values='Temperature')\n"
                         "print(\"\\nPivoted DataFrame with missing data:\")\n"
                         "print(pivot_df_missing)\n"
    ),
    
    # Reshaping Back
    nbf.v4.new_markdown_cell("## Reshaping Back: From Pivoted DataFrame to Original\n\n"
                             "If you need to reshape the pivoted DataFrame back to its original long format, you can use the `pandas.DataFrame.melt` function."
    ),
    nbf.v4.new_code_cell("# Reshape back using melt\n"
                         "melted_df = pivot_df_missing.reset_index().melt(id_vars='Date', value_name='Temperature')\n"
                         "print(\"\\nMelted DataFrame:\")\n"
                         "print(melted_df)\n"
    ),
    
    # Pivoting with MultiIndex
    nbf.v4.new_markdown_cell("## Pivoting with MultiIndex\n\n"
                             "You can create a pivoted DataFrame with multiple index levels (MultiIndex) by using multiple columns in the `index` parameter."
    ),
    nbf.v4.new_code_cell("# Create a DataFrame with more complex data\n"
                         "data_multi = {\n"
                         "    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03'],\n"
                         "    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York'],\n"
                         "    'Type': ['Temperature', 'Temperature', 'Temperature', 'Temperature', 'Temperature'],\n"
                         "    'Value': [32, 75, 30, 78, 28]\n"
                         "}\n\n"
                         "df_multi = pd.DataFrame(data_multi)\n"
                         "print(\"\\nOriginal DataFrame with multiple index levels:\")\n"
                         "print(df_multi)\n\n"
                         "# Pivot with MultiIndex\n"
                         "pivot_df_multi = df_multi.pivot(index=['Date', 'Type'], columns='City', values='Value')\n"
                         "print(\"\\nPivoted DataFrame with MultiIndex:\")\n"
                         "print(pivot_df_multi)\n"
    ),
    
    # Pivoting without Aggregation
    nbf.v4.new_markdown_cell("## Pivoting without Aggregation: Difference from pivot_table\n\n"
                             "Unlike `pivot_table`, which performs aggregation (e.g., sum, mean), `pivot` does not perform any aggregation. It simply reshapes the data."
    ),
    
    # Summary
    nbf.v4.new_markdown_cell("## Summary\n\n"
                             "The `pandas.DataFrame.pivot` function is used to reshape DataFrames by transforming data into a wide format, making it easier to analyze and report. Key points include:\n\n"
                             "- **Parameters**:\n"
                             "  - **index**: Determines the new index for the pivoted DataFrame.\n"
                             "  - **columns**: Defines the new columns based on unique values from this column.\n"
                             "  - **values**: Specifies which column's values to use for filling the new DataFrame.\n"
                             "- **Handling Missing Data**: The resulting pivoted DataFrame will show NaN for missing values.\n"
                             "- **Reshaping**: Use `melt` to reshape data back to its long format.\n"
                             "- **MultiIndex**: Supports creating pivoted DataFrames with hierarchical indices.\n\n"
                             "Understanding how to use `pivot` effectively can help you organize and analyze data more efficiently, especially when dealing with complex datasets."
    )
]

# Add cells to the notebook
nb.cells.extend(cells)

# Save the notebook to a file
with open('pivot_example.ipynb', 'w',encoding='utf-8') as f:
    nbf.write(nb, f)

print("Notebook 'pivot_example.ipynb' created successfully.")


Notebook 'pivot_example.ipynb' created successfully.


In [1]:
import nbformat as nbf

# Create a new notebook object
nb = nbf.v4.new_notebook()

# List of cells to add to the notebook
cells = [
    nbf.v4.new_markdown_cell("# pandas.melt"),
    nbf.v4.new_markdown_cell("pandas.melt is a powerful function in pandas that allows you to transform or reshape a DataFrame from a wide format to a long format. This is often used when you need to normalize your data for easier analysis or to prepare it for specific types of visualizations or operations."),
    
    nbf.v4.new_markdown_cell("## 1. Basic Concept of pandas.melt"),
    nbf.v4.new_markdown_cell("* **Wide format:** Data is spread across multiple columns. Each column represents a different variable.\n"
                             "* **Long format:** Data is condensed into fewer columns, with one column identifying the variable type and another column holding the value.\n"
                             "pandas.melt essentially unpivots the DataFrame, making it longer by turning multiple columns into rows."),
    
    nbf.v4.new_markdown_cell("## 2. Syntax"),
    nbf.v4.new_code_cell("pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)"),
    nbf.v4.new_markdown_cell("* **frame:** The DataFrame to melt.\n"
                             "* **id_vars:** Columns to use as identifiers (i.e., columns that should remain fixed).\n"
                             "* **value_vars:** Columns to unpivot (i.e., columns to convert into rows).\n"
                             "* **var_name:** Name to use for the variable column. If not specified, uses the original column name.\n"
                             "* **value_name:** Name to use for the value column.\n"
                             "* **col_level:** If columns are multi-indexed, this specifies which level to melt.\n"
                             "* **ignore_index:** If True, the index is reset in the result."),
    
    nbf.v4.new_markdown_cell("## 3. Basic Example of pandas.melt"),
    nbf.v4.new_code_cell("import pandas as pd\n\n"
                         "# Create a simple DataFrame\n"
                         "data = {\n"
                         "    'Date': ['2023-01-01', '2023-01-02'],\n"
                         "    'New York': [32, 30],\n"
                         "    'Los Angeles': [75, 78],\n"
                         "    'Chicago': [28, 27]\n"
                         "}\n\n"
                         "df = pd.DataFrame(data)\n"
                         "print('Original DataFrame:')\n"
                         "print(df)\n\n"
                         "# Melt the DataFrame\n"
                         "melted_df = pd.melt(df, id_vars=['Date'], value_vars=['New York', 'Los Angeles', 'Chicago'],\n"
                         "                    var_name='City', value_name='Temperature')\n"
                         "print('\\nMelted DataFrame:')\n"
                         "print(melted_df)"),
    nbf.v4.new_markdown_cell("* **Explanation:**\n"
                             "  * **id_vars:** The Date column remains fixed.\n"
                             "  * **value_vars:** The New York, Los Angeles, and Chicago columns are unpivoted.\n"
                             "  * **var_name:** The unpivoted column names are stored in the City column.\n"
                             "  * **value_name:** The values from the unpivoted columns are stored in the Temperature column."),
    
    nbf.v4.new_markdown_cell("## 4. Melt with Default Parameters"),
    nbf.v4.new_code_cell("# Melt without specifying value_vars\n"
                         "melted_df_default = pd.melt(df, id_vars=['Date'])\n"
                         "print('\\nMelted DataFrame with default value_vars:')\n"
                         "print(melted_df_default)"),
    nbf.v4.new_markdown_cell("* **Explanation:**\n"
                             "  * By default, melt uses all columns except id_vars as value_vars.\n"
                             "  * The variable column (default name) holds the column names, and the value column holds the data."),
    
    nbf.v4.new_markdown_cell("## 5. Changing var_name and value_name"),
    nbf.v4.new_code_cell("# Melt with custom var_name and value_name\n"
                         "melted_df_custom = pd.melt(df, id_vars=['Date'], var_name='Location', value_name='Temp')\n"
                         "print('\\nMelted DataFrame with custom var_name and value_name:')\n"
                         "print(melted_df_custom)"),
    nbf.v4.new_markdown_cell("* **Explanation:**\n"
                             "  * The variable column is renamed to Location.\n"
                             "  * The value column is renamed to Temp."),
    
    nbf.v4.new_markdown_cell("## 6. Using pandas.melt with MultiIndex Columns"),
    nbf.v4.new_code_cell("# Create a DataFrame with MultiIndex columns\n"
                         "arrays = [['Temperature', 'Temperature', 'Humidity', 'Humidity'],\n"
                         "          ['New York', 'Los Angeles', 'New York', 'Los Angeles']]\n"
                         "index = pd.MultiIndex.from_arrays(arrays, names=('Type', 'City'))\n\n"
                         "df_multi = pd.DataFrame([[32, 75, 80, 20], [30, 78, 85, 18]], columns=index, index=['2023-01-01', '2023-01-02'])\n"
                         "print('\\nOriginal DataFrame with MultiIndex columns:')\n"
                         "print(df_multi)\n\n"
                         "# Melt the MultiIndex DataFrame\n"
                         "melted_df_multi = pd.melt(df_multi.reset_index(), id_vars=['index'], col_level=1)\n"
                         "print('\\nMelted DataFrame with MultiIndex columns:')\n"
                         "print(melted_df_multi)"),
    nbf.v4.new_markdown_cell("* **Explanation:**\n"
                             "  * col_level=1 specifies that the second level of the columns (City) is melted.\n"
                             "  * The Type level of the MultiIndex is not melted and remains as part of the new column headers."),
    
    nbf.v4.new_markdown_cell("## 7. Ignoring Index with ignore_index"),
    nbf.v4.new_code_cell("# Melt with ignore_index\n"
                         "melted_df_ignore_index = pd.melt(df, id_vars=['Date'], ignore_index=False)\n"
                         "print('\\nMelted DataFrame with original index retained:')\n"
                         "print(melted_df_ignore_index)"),
    nbf.v4.new_markdown_cell("* **Explanation:**\n"
                             "  * The original index from the DataFrame is retained in the melted DataFrame."),
    
    nbf.v4.new_markdown_cell("## 8. When to Use pandas.melt"),
    nbf.v4.new_markdown_cell("* **Normalization:** If you have data in a wide format (multiple columns for variables) and you need to normalize it for analysis.\n"
                             "* **Visualization:** Certain visualizations or statistical analyses require data in a long format.\n"
                             "* **Data Preparation:** Prepares data for certain types of operations, like grouping, merging, or applying functions that require long-format data."),
    
    nbf.v4.new_markdown_cell("## Summary"),
    nbf.v4.new_markdown_cell("* **pandas.melt** is used to reshape DataFrames from wide to long format.\n"
                             "* **Key Concepts:**\n"
                             "  * **id_vars:** Columns that remain fixed in the output DataFrame.\n"
                             "  * **value_vars:** Columns to unpivot into rows.\n"
                             "  * **var_name** and **value_name:** Custom names for the resulting columns.\n"
                             "  * **col_level:** Used for MultiIndex columns to specify which level to melt.\n"
                             "  * **ignore_index:** Determines whether to reset the index in the resulting DataFrame.\n"
                             "* **Applications:**\n"
                             "  * Normalizing data.\n"
                             "  * Preparing data for analysis, visualization, or further processing.\n"
                             "Understanding how to use pandas.melt effectively can help you manipulate and analyze your data more efficiently, particularly when dealing with complex datasets that need to be reshaped for specific tasks.")
]

# Add the cells to the notebook
nb['cells'] = cells

# Write the notebook to a file
with open("melt_example.ipynb", "w", encoding="utf-8") as f:
    nbf.write(nb, f)
    
print("Notebook 'melt_example.ipynb' created successfully.")


Notebook 'melt_example.ipynb' created successfully.
