
**I. Data Structures**

* **Series:** A one-dimensional array-like structure holding data of any type. Think of it like a column in a spreadsheet.
* **DataFrame:** A two-dimensional tabular structure with labeled rows and columns. This is what you'll use most often, as it resembles an Excel sheet or a database table.

**II. Essential Commands**

1. **Importing Data**
   * `pd.read_csv()` Read CSV files.
   * `pd.read_excel()` Read Excel files.
   * `pd.read_json()` Read JSON data.

2. **Exploring Data**
   * `df.head()` View the first few rows.
   * `df.tail()` View the last few rows.
   * `df.info()` Get summary information (data types, non-null counts, etc.).
   * `df.describe()` Generate descriptive statistics.
   * `df.shape`  Get the dimensions of the DataFrame (rows, columns).

3. **Selecting Data**
   * `df['column_name']` Select a column by its name.
   * `df.loc[row_index]` Select rows by label/index.
   * `df.iloc[row_position]` Select rows by position.
   * `df[df['column_name'] > value]` Conditional selection.  

4. **Data Manipulation**
   * `df.sort_values()` Sort by the values of one or more columns.
   * `df.groupby()` Group data and perform aggregations. 
   * `df.drop_duplicates()` Remove duplicate rows.
   * `df.dropna()` Drop rows or columns with missing values.
   * `df.fillna()` Fill missing values.

**III. Concepts to Master**

* **Indexing and Slicing:** How to access and manipulate specific parts of your DataFrames using various techniques.
* **Boolean Indexing:** Using conditions to filter and select data.
* **Merging & Joining:** Combining DataFrames based on common keys.
* **Aggregation:** Calculating summary statistics using functions like `sum`, `mean`, `count`.
* **Handling Missing Data:**  Understanding different techniques for dealing with missing values.

**IV. Additional Areas to Explore (As you advance)**

* **Method Chaining:** Streamlining operations by chaining commands together.
* **Vectorized Operations:** Applying operations element-wise for efficiency.
* **Time Series Analysis:** Working with date and time data, resampling, etc.
* **Plotting with Matplotlib:** Basic visualization integrated with pandas.

**How to Practice**

* **Find datasets:**  Kaggle ([https://www.kaggle.com/datasets](https://www.kaggle.com/datasets)) is a great resource.
* **Work on projects:** Come up with analysis questions to answer, even with small datasets.
* **Official pandas documentation:** It's excellent!  ([https://pandas.pydata.org/docs/](https://pandas.pydata.org/docs/))



In [None]:
import re

# Sample data with spaces
data = [
    "+233 54 275 1515",
    "+233  79  88  77  55",
    "+1 (234) 567 88 99",  # Include a US number for variation
]

# Function to remove spaces and special characters from phone numbers
def clean_phone_number(number):
  """
  This function removes spaces and special characters from a phone number.

  Args:
      number: The phone number string to clean.

  Returns:
      The cleaned phone number string without spaces or special characters.
  """
  return re.sub(r"[^\d+]", "", number)

# Clean phone numbers in the list
cleaned_data = [clean_phone_number(number) for number in data]

# Print the cleaned data
print(cleaned_data)



**1. Import the Libraries:**

```python
import pandas as pd
import re
```

**2. Define the Function:**

```python
def clean_phone_number(number):
  """
  This function removes spaces and special characters from a phone number.

  Args:
      number: The phone number string to clean.

  Returns:
      The cleaned phone number string without spaces or special characters.
  """
  return re.sub(r"[^\d+]", "", number)
```

**3. Create a Sample DataFrame:**

```python
data = {
    'phone_number': ['+233 54 275 1515', '+233 20 123 4567', '+1 (555) 123-4567']
}
df = pd.DataFrame(data)
```

**4. Apply the Function:**

```python
df['cleaned_phone_number'] = df['phone_number'].apply(clean_phone_number)
```

**Explanation:**

* `df['cleaned_phone_number'] = ...` : This creates a new column in the DataFrame called 'cleaned_phone_number'.
* `df['phone_number'].apply(clean_phone_number)` :  The `apply` method takes the `clean_phone_number` function and applies it to each element in the 'phone_number' column, cleaning the data.

**Result:**

The DataFrame `df` will now have an additional column 'cleaned_phone_number' containing the cleaned phone numbers:

```
            phone_number  cleaned_phone_number
0  +233 54 275 1515         +233542751515
1  +233 20 123 4567         +233201234567
2  +1 (555) 123-4567         +15551234567 
```

**Important Note:** If your phone number data is initially stored as integers, you might need to convert them to strings before applying `clean_phone_number`. You can do this using `df['phone_number'] = df['phone_number'].astype(str)`.

