# Pandas Tutorial - Part 56

This notebook introduces the pandas DataFrame class and covers:
- DataFrame constructor and basics
- Converting Series to other formats
- The `abs()` method for DataFrames

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## DataFrame Constructor and Basics

A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It can be thought of as a dict-like container for Series objects and is the primary pandas data structure.

In [None]:
# Create a DataFrame from a dictionary of Series
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
df = pd.DataFrame({'col1': s1, 'col2': s2})
print("DataFrame from Series:")
print(df)

In [None]:
# Create a DataFrame from a dictionary of lists
data = {
    'name': ['John', 'Anna', 'Peter', 'Linda'],
    'age': [28, 34, 29, 42],
    'city': ['New York', 'Paris', 'Berlin', 'London']
}
df_dict = pd.DataFrame(data)
print("\nDataFrame from dictionary:")
print(df_dict)

In [None]:
# Create a DataFrame from a list of dictionaries
data_list = [
    {'name': 'John', 'age': 28, 'city': 'New York'},
    {'name': 'Anna', 'age': 34, 'city': 'Paris'},
    {'name': 'Peter', 'age': 29, 'city': 'Berlin'},
    {'name': 'Linda', 'age': 42, 'city': 'London'}
]
df_list = pd.DataFrame(data_list)
print("\nDataFrame from list of dictionaries:")
print(df_list)

In [None]:
# Create a DataFrame from a NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df_arr = pd.DataFrame(arr, columns=['A', 'B', 'C'], index=['row1', 'row2', 'row3'])
print("\nDataFrame from NumPy array:")
print(df_arr)

In [None]:
# Create a DataFrame with custom index and columns
df_custom = pd.DataFrame(
    data=np.random.randn(3, 2),  # 3 rows, 2 columns of random data
    index=['A', 'B', 'C'],
    columns=['col1', 'col2']
)
print("\nDataFrame with custom index and columns:")
print(df_custom)

In [None]:
# Create a DataFrame with a specific data type
df_dtype = pd.DataFrame(
    data={'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0]},
    dtype=np.float64
)
print("\nDataFrame with specific dtype:")
print(df_dtype)
print("\nDataFrame dtypes:")
print(df_dtype.dtypes)

In [None]:
# Create a DataFrame with planet data
df_planets = pd.DataFrame(
    {'mass': [0.330, 4.87, 5.97],
     'radius': [2439.7, 6051.8, 6378.1]},
    index=['Mercury', 'Venus', 'Earth']
)
print("\nPlanets DataFrame:")
print(df_planets)

## Converting Series to Other Formats

Pandas provides various methods to convert Series to other formats for data interchange and storage.

In [None]:
# Create a Series for conversion examples
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'], name='example_series')
print("Original Series:")
print(s)

In [None]:
# Convert Series to dictionary
s_dict = s.to_dict()
print("\nSeries to dictionary:")
print(s_dict)

In [None]:
# Convert Series to DataFrame
s_frame = s.to_frame()
print("\nSeries to DataFrame:")
print(s_frame)

In [None]:
# Convert Series to DataFrame with custom name
s_frame_named = s.to_frame(name='custom_name')
print("\nSeries to DataFrame with custom name:")
print(s_frame_named)

In [None]:
# Convert Series to string
s_string = s.to_string()
print("\nSeries to string:")
print(s_string)

In [None]:
# Convert Series to JSON
s_json = s.to_json()
print("\nSeries to JSON:")
print(s_json)

In [None]:
# Convert Series to CSV string
s_csv = s.to_csv()
print("\nSeries to CSV string:")
print(s_csv)

In [None]:
# Convert Series to Markdown
try:
    s_markdown = s.to_markdown()
    print("\nSeries to Markdown:")
    print(s_markdown)
except ImportError:
    print("\nMarkdown conversion requires the 'tabulate' package.")

## The `abs()` Method for DataFrames

The `abs()` method returns a Series/DataFrame with the absolute numeric value of each element. This function only applies to elements that are all numeric.

In [None]:
# Create a Series with negative values
s_neg = pd.Series([-1.10, 2, -3.33, 4])
print("Series with negative values:")
print(s_neg)

In [None]:
# Apply abs() to Series
s_abs = s_neg.abs()
print("\nAbsolute values:")
print(s_abs)

In [None]:
# Create a Series with complex numbers
s_complex = pd.Series([1.2 + 1j])
print("\nSeries with complex numbers:")
print(s_complex)

In [None]:
# Apply abs() to Series with complex numbers
s_complex_abs = s_complex.abs()
print("\nAbsolute values of complex numbers:")
print(s_complex_abs)

In [None]:
# Create a Series with Timedelta
s_timedelta = pd.Series([pd.Timedelta('1 days')])
print("\nSeries with Timedelta:")
print(s_timedelta)

In [None]:
# Apply abs() to Series with Timedelta
s_timedelta_abs = s_timedelta.abs()
print("\nAbsolute values of Timedelta:")
print(s_timedelta_abs)

In [None]:
# Create a DataFrame with mixed positive and negative values
df_mixed = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})
print("\nDataFrame with mixed values:")
print(df_mixed)

In [None]:
# Apply abs() to DataFrame
df_abs = df_mixed.abs()
print("\nAbsolute values of DataFrame:")
print(df_abs)

In [None]:
# Example: Find rows with values closest to a target value
target_value = 20
closest_rows = (df_mixed['c'] - target_value).abs().argsort()
print(f"\nRows with values in column 'c' closest to {target_value}:")
print(df_mixed.iloc[closest_rows])

## Practical Example: Data Normalization and Scaling

In [None]:
# Create a DataFrame with student test scores
np.random.seed(42)  # For reproducibility
scores = pd.DataFrame({
    'math': np.random.randint(-100, 100, 10),
    'science': np.random.randint(-100, 100, 10),
    'english': np.random.randint(-100, 100, 10)
})
print("Student test scores (with errors):")
print(scores)

In [None]:
# Correct the errors by taking absolute values
corrected_scores = scores.abs()
print("\nCorrected test scores:")
print(corrected_scores)

In [None]:
# Calculate z-scores (standardization)
z_scores = (corrected_scores - corrected_scores.mean()) / corrected_scores.std()
print("\nZ-scores:")
print(z_scores)

In [None]:
# Min-max scaling to [0, 1]
min_max_scaled = (corrected_scores - corrected_scores.min()) / (corrected_scores.max() - corrected_scores.min())
print("\nMin-max scaled scores:")
print(min_max_scaled)

In [None]:
# Visualize the original and scaled data
fig, axes = plt.subplots(3, 1, figsize=(10, 12))

corrected_scores.plot(kind='bar', ax=axes[0], title='Original Scores')
axes[0].set_ylabel('Score')
axes[0].grid(axis='y', linestyle='--', alpha=0.7)

z_scores.plot(kind='bar', ax=axes[1], title='Z-Scores')
axes[1].set_ylabel('Z-Score')
axes[1].grid(axis='y', linestyle='--', alpha=0.7)

min_max_scaled.plot(kind='bar', ax=axes[2], title='Min-Max Scaled Scores')
axes[2].set_ylabel('Scaled Score')
axes[2].grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

## Conclusion

In this notebook, we've explored:

1. **DataFrame Constructor**: We've seen how to create DataFrames from various data sources, including dictionaries, lists, NumPy arrays, and Series objects. The DataFrame is the primary pandas data structure for working with tabular data.

2. **Converting Series to Other Formats**: We've explored methods for converting Series to various formats, including dictionaries, DataFrames, strings, JSON, CSV, and Markdown, which are useful for data interchange and storage.

3. **The `abs()` Method**: We've learned how to use the `abs()` method to get the absolute values of numeric elements in Series and DataFrames, which is useful for data cleaning, normalization, and analysis.

4. **Practical Applications**: We've seen how these methods can be applied in real-world scenarios, such as data normalization, standardization, and finding values closest to a target.

These tools and techniques are essential for data manipulation and analysis in pandas, providing a solid foundation for working with tabular data in Python.