<a href="https://colab.research.google.com/github/2303A51618/AI-Assistant-Coding/blob/main/AI_asst_2_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

First, let's create a sample text file named `sample.txt` with some content. You can replace this with your own file later.

In [None]:
sample_text = """This is a sample text file. This file contains some sample text to demonstrate word frequency counting. Sample text is useful."""

with open('sample.txt', 'w') as f:
    f.write(sample_text)

print("Created 'sample.txt' with sample content.")

Created 'sample.txt' with sample content.


Now, let's write the Python code to read this text file, process its content, and count the frequency of each word. We'll use the `collections.Counter` class for efficient counting.

In [None]:
import re
from collections import Counter

def count_word_frequency(filepath):
    """
    Reads a text file, counts the frequency of each word,
    and returns a Counter object.
    """
    all_words = []
    try:
        with open(filepath, 'r', encoding='utf-8') as file:
            text = file.read()
            # Convert to lowercase and remove punctuation
            cleaned_text = re.sub(r'[^a-zA-Z\s]', '', text).lower()
            # Split the text into words
            words = cleaned_text.split()
            all_words.extend(words)
    except FileNotFoundError:
        print(f"Error: The file '{filepath}' was not found.")
        return Counter()
    except Exception as e:
        print(f"An error occurred: {e}")
        return Counter()

    # Count word frequencies
    word_counts = Counter(all_words)
    return word_counts

# Specify the path to your text file
file_path = 'sample.txt'

# Call the function to count word frequencies
frequencies = count_word_frequency(file_path)

# Display the top 10 most common words
print(f"\nWord frequencies in '{file_path}':")
if frequencies:
    for word, count in frequencies.most_common(10):
        print(f"{word}: {count}")
else:
    print("No words found or an error occurred.")


Word frequencies in 'sample.txt':
sample: 3
text: 3
this: 2
is: 2
file: 2
a: 1
contains: 1
some: 1
to: 1
demonstrate: 1


First, let's implement the **Bubble Sort** algorithm. It's a simple comparison-based sorting algorithm.

In [None]:
def bubble_sort(arr):
    n = len(arr)
    # Traverse through all array elements
    for i in range(n - 1):
        # Last i elements are already in place
        for j in range(0, n - i - 1):
            # Traverse the array from 0 to n-i-1
            # Swap if the element found is greater
            # than the next element
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr

# Test Bubble Sort
my_list = [64, 34, 25, 12, 22, 11, 90]
print(f"Original list: {my_list}")
sorted_list_bubble = bubble_sort(list(my_list)) # Use a copy to avoid modifying original
print(f"Sorted using Bubble Sort: {sorted_list_bubble}")

Original list: [64, 34, 25, 12, 22, 11, 90]
Sorted using Bubble Sort: [11, 12, 22, 25, 34, 64, 90]


Next, let's use Python's **built-in `sort()` method** (for lists) or `sorted()` function (for any iterable). These are highly optimized and generally preferred in practice.

In [None]:
# Test Python's built-in sort()
my_list_2 = [64, 34, 25, 12, 22, 11, 90]
print(f"Original list: {my_list_2}")
my_list_2.sort() # Sorts the list in-place
print(f"Sorted using built-in sort() method: {my_list_2}")

# Test Python's built-in sorted() function
my_tuple = (64, 34, 25, 12, 22, 11, 90)
print(f"Original tuple: {my_tuple}")
sorted_tuple = sorted(my_tuple) # Returns a new sorted list from an iterable
print(f"Sorted using built-in sorted() function: {sorted_tuple}")

Original list: [64, 34, 25, 12, 22, 11, 90]
Sorted using built-in sort() method: [11, 12, 22, 25, 34, 64, 90]
Original tuple: (64, 34, 25, 12, 22, 11, 90)
Sorted using built-in sorted() function: [11, 12, 22, 25, 34, 64, 90]


### Comparison of Bubble Sort vs. Python's Built-in Sort

| Feature           | Bubble Sort                                         | Python's Built-in `sort()` / `sorted()`                               |
| :---------------- | :-------------------------------------------------- | :-------------------------------------------------------------------- |
| **Algorithm**     | Simple comparison-based, repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. | Timsort (a hybrid stable sorting algorithm, derived from merge sort and insertion sort). Highly optimized. |
| **Time Complexity** | **Worst/Average:** O(n^2) (very inefficient for large datasets). **Best:** O(n) (if already sorted). | **Worst/Average/Best:** O(n log n) (very efficient for large datasets). |
| **Space Complexity** | O(1) (in-place sort).                               | O(n) in worst case (for Timsort, due to temporary storage).            |
| **Stability**     | Stable (maintains the relative order of equal elements). | Stable.                                                                |
| **Ease of Use**   | Requires manual implementation.                     | Very easy to use; built directly into Python.                         |
| **When to Use**   | Primarily for educational purposes to understand basic sorting concepts. Not recommended for production. | Almost always preferred for general-purpose sorting in Python due to its efficiency and reliability. |

**In summary:** While Bubble Sort is easy to understand, Python's built-in `sort()` method (and `sorted()` function) are significantly more efficient, robust, and should be used for practical applications.

In [None]:
import pandas as pd

# Define the path to your CSV file
csv_file_path = '/content/data.csv'

try:
    # Read the CSV file into a pandas DataFrame
    df = pd.read_csv(csv_file_path)
    print(f"Successfully loaded data from '{csv_file_path}'.")

    # Display the first few rows of the DataFrame to understand its structure
    print("\nFirst 5 rows of the DataFrame:")
    display(df.head())

    # Select only numerical columns for calculations
    numerical_df = df.select_dtypes(include=['number'])

    if not numerical_df.empty:
        # Calculate mean, minimum, and maximum for numerical columns
        mean_values = numerical_df.mean()
        min_values = numerical_df.min()
        max_values = numerical_df.max()

        print("\nCalculated Statistics (Mean, Min, Max):\n")
        print("Mean values:")
        display(mean_values)
        print("\nMinimum values:")
        display(min_values)
        print("\nMaximum values:")
        display(max_values)
    else:
        print("No numerical columns found in the CSV to calculate statistics.")

except FileNotFoundError:
    print(f"Error: The file '{csv_file_path}' was not found. Please ensure the path is correct and the file exists.")
except pd.errors.EmptyDataError:
    print(f"Error: The file '{csv_file_path}' is empty.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Successfully loaded data from '/content/data.csv'.

First 5 rows of the DataFrame:


Unnamed: 0,Duration,Pulse,Maxpulse,Calories
0,60,110,130,409.1
1,60,117,145,479.0
2,60,103,135,340.0
3,45,109,175,282.4
4,45,117,148,406.0



Calculated Statistics (Mean, Min, Max):

Mean values:


Unnamed: 0,0
Duration,63.846154
Pulse,107.461538
Maxpulse,134.047337
Calories,375.790244



Minimum values:


Unnamed: 0,0
Duration,15.0
Pulse,80.0
Maxpulse,100.0
Calories,50.3



Maximum values:


Unnamed: 0,0
Duration,300.0
Pulse,159.0
Maxpulse,184.0
Calories,1860.4


### Code Explanation:

1.  **`import pandas as pd`**: Imports the pandas library, which is essential for working with DataFrames (table-like data structures).
2.  **`csv_file_path = '/content/data.csv'`**: Defines the path to your CSV file. Make sure this path is correct.
3.  **`try...except` block**: This block is used for robust error handling:
    *   **`df = pd.read_csv(csv_file_path)`**: Attempts to read the CSV file into a pandas DataFrame named `df`.
    *   **`display(df.head())`**: Shows the first 5 rows of the loaded DataFrame. This helps verify that the data was loaded correctly and gives an idea of its structure.
    *   **`numerical_df = df.select_dtypes(include=['number'])`**: Filters the DataFrame to include only columns with numerical data types (integers and floats). Statistical calculations only make sense for numerical columns.
    *   **`mean_values = numerical_df.mean()`**: Calculates the mean (average) for each numerical column.
    *   **`min_values = numerical_df.min()`**: Calculates the minimum value for each numerical column.
    *   **`max_values = numerical_df.max()`**: Calculates the maximum value for each numerical column.
    *   **`display(...)`**: Prints the calculated mean, minimum, and maximum values for each respective numerical column.
    *   **Error Handling**: Catches `FileNotFoundError` if the CSV doesn't exist, `EmptyDataError` if the file is empty, and a general `Exception` for any other issues, providing informative messages.