<a href="https://colab.research.google.com/github/TETRAWasTaken/MonoRepo/blob/main/DPEL_Ass_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Importing Data into Notebook**

## Reading CSV

In [None]:
import pandas as pd

df_csv = pd.read_csv('/content/sample_data/mnist_train_small.csv')
df_csv.head()

Unnamed: 0,6,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,...,0.581,0.582,0.583,0.584,0.585,0.586,0.587,0.588,0.589,0.590
0,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
import pandas as pd

# Create a dummy CSV file
data = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}
df_dummy = pd.DataFrame(data)
df_dummy.to_csv('sample.csv', index=False)

# Now, read the dummy CSV file
df_csv = pd.read_csv('sample.csv')
df_csv.head()

Unnamed: 0,col1,col2
0,1,A
1,2,B
2,3,C


## Reading JSON

In [None]:
import json

# Save the data dictionary to a JSON file
with open('sample.json', 'w') as f:
    json.dump(data, f)

# Read the data from the JSON file into a pandas DataFrame
df_json = pd.read_json('sample.json')

# Display the head of the DataFrame
df_json.head()

Unnamed: 0,col1,col2
0,1,A
1,2,B
2,3,C



### Explaining Data Reading Functions in Pandas

Pandas provides convenient functions for reading data from various file formats. Two commonly used functions are `pandas.read_csv()` and `pandas.read_json()`.

#### `pandas.read_csv()`

This function is used to read a comma-separated values (CSV) file into a pandas DataFrame.

**Input Parameters:**

*   `filepath_or_buffer`: This is the most important parameter and specifies the path to the CSV file or a file-like object.
*   `sep`: This parameter specifies the delimiter to use. The default is ',', but you can specify other delimiters like '\t' for tab-separated values.


**Output:**

The function returns a pandas DataFrame containing the data from the CSV file.

#### `pandas.read_json()`

This function is used to read a JSON file into a pandas DataFrame.

**Input Parameters:**

*   `path`: This parameter specifies the path to the JSON file or a file-like object.
*   `orient`: This parameter specifies the expected JSON format. Common values include 'columns' (default), 'index', 'records', 'split', and 'values'.

**Output:**

The function returns a pandas DataFrame containing the data from the JSON file.

## Basic Operations on the DataFrame

In [None]:
print("Operations on df_csv:")
display(df_csv.head())
display(df_csv.info())
display(df_csv.describe())

print("\nOperations on df_json:")
display(df_json.head())
display(df_json.info())
display(df_json.describe())

Operations on df_csv:


Unnamed: 0,col1,col2
0,1,A
1,2,B
2,3,C


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   col1    3 non-null      int64 
 1   col2    3 non-null      object
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes


None

Unnamed: 0,col1
count,3.0
mean,2.0
std,1.0
min,1.0
25%,1.5
50%,2.0
75%,2.5
max,3.0



Operations on df_json:


Unnamed: 0,col1,col2
0,1,A
1,2,B
2,3,C


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   col1    3 non-null      int64 
 1   col2    3 non-null      object
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes


None

Unnamed: 0,col1
count,3.0
mean,2.0
std,1.0
min,1.0
25%,1.5
50%,2.0
75%,2.5
max,3.0


### Explanation of Basic Data Operations

After loading data into a pandas DataFrame, several basic operations are commonly used to get a quick overview of the data's structure and content.

#### `.head()`

The `.head()` method is used to display the first few rows of the DataFrame. By default, it shows the first 5 rows, but you can specify a different number of rows as an argument (e.g., `.head(10)`). This is useful for quickly inspecting the structure and content of the data.

#### `.info()`

The `.info()` method provides a concise summary of the DataFrame. It includes information such as the number of entries, the number of columns, the names of the columns, the number of non-null values in each column, the data type of each column, and the memory usage of the DataFrame. This method is crucial for understanding the data types and identifying missing values.

#### `.describe()`

The `.describe()` method generates descriptive statistics of the DataFrame's numerical columns. It calculates and displays statistics such as the count, mean, standard deviation, minimum value, maximum value, and the quartiles (25th, 50th, and 75th percentiles). This method is helpful for getting a quick statistical summary of the numerical data.

## Using Numpy to read a CSV file

In [None]:
import numpy as np

# Use numpy.genfromtxt to read the data from the sample.csv file
numpy_data = np.genfromtxt('sample.csv', delimiter=',', dtype=str)

# Print the loaded data
print(numpy_data)

[['col1' 'col2']
 ['1' 'A']
 ['2' 'B']
 ['3' 'C']]


## Using Pickle

In [None]:
import pickle

# Define a file name
pickle_file = 'data.pkl'

# Save the DataFrame using pickle
with open(pickle_file, 'wb') as f:
    pickle.dump(df_csv, f)

print(f"DataFrame saved to {pickle_file}")

# Load the DataFrame using pickle
with open(pickle_file, 'rb') as f:
    loaded_df = pickle.load(f)

print(f"DataFrame loaded from {pickle_file}")

# Display the loaded DataFrame to verify
display(loaded_df)

DataFrame saved to data.pkl
DataFrame loaded from data.pkl


Unnamed: 0,col1,col2
0,1,A
1,2,B
2,3,C
