Python Pandas is an open-source data manipulation and analysis library built on top of the Python programming language. It provides data structures and functions that make it easy to work with structured data, such as tabular data, time series, and heterogeneous data.

Here are some key features of Pandas:

1. Data structures: Pandas provides two main data structures - Series and DataFrame. Series is a one-dimensional array-like object, and DataFrame is a two-dimensional tabular data structure similar to a spreadsheet or SQL table.

2. Data manipulation: Pandas offers a wide range of tools for cleaning, transforming, and analyzing data. It includes functions for filtering, sorting, grouping, merging, joining, reshaping, and aggregating data.

3. Missing data handling: Pandas provides methods for handling missing data, including filling in missing values, removing rows or columns with missing values, and interpolating missing values.

4. Time series functionality: Pandas has robust support for working with time series data, including date/time indexing, resampling, and frequency conversion.

5. Input/output tools: Pandas supports reading and writing data in various formats, including CSV, Excel, SQL databases, JSON, HTML, and HDF5.

6. Integration with other libraries: Pandas integrates well with other popular Python libraries for data analysis and visualization, such as NumPy, Matplotlib, Seaborn, and Scikit-learn.

Real-world scenarios where Pandas is commonly used include:

1. Data cleaning and preprocessing: Before performing any analysis or modeling on a dataset, it often requires cleaning and preprocessing. Pandas provides powerful tools for handling missing values, removing duplicates, converting data types, and restructuring data.

2. Data analysis and exploration: Pandas makes it easy to perform exploratory data analysis (EDA) by computing descriptive statistics, generating summary tables, and visualizing data using built-in plotting functions or integration with Matplotlib and Seaborn.

3. Time series analysis: Pandas is widely used for analyzing time series data, such as financial data, sensor data, or weather data. It provides functionalities for resampling, time shifting, rolling window calculations, and analyzing trends and seasonality.

4. Data manipulation and transformation: Pandas is invaluable for transforming and reshaping data, such as pivoting tables, merging multiple datasets, and performing group-by operations.

5. Data integration and preparation for machine learning: Pandas can be used to integrate data from various sources, clean and preprocess the data, and prepare it for machine learning models. It allows users to select features, scale or normalize data, and split datasets into training and testing sets.

Overall, Pandas is a versatile and powerful library that simplifies data manipulation and analysis tasks in Python, making it an essential tool for data scientists, analysts, and developers working with structured data.

In [None]:
from google.colab import files
uploaded = files.upload()

Saving NewData.csv to NewData.csv


In [None]:
#Creating a DataFrame:

import pandas as pd

# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston


In [None]:
import pandas as pd

StudentSet = {'SName' : ['S1', 'S2', 'S3'],
              'SRollNo' : ['GLA123', 'GLA234', 'GLA356'],
              'Marks': [34, 45, 67]
              }
myStudData = pd.DataFrame(StudentSet)
print(myStudData)

  SName SRollNo  Marks
0    S1  GLA123     34
1    S2  GLA234     45
2    S3  GLA356     67


In [None]:
import pandas as pd
mydata = {"Student_Name" : ['S1', 'S2', 'S3', 'S4', 'S5'],
          "Marks": [60, 75, 88, 89, 90],
          "Section": ['CF','CA','CF','CB', 'EA']}
myDisplay = pd.DataFrame(mydata)
print(myDisplay)

  Student_Name  Marks Section
0           S1     60      CF
1           S2     75      CA
2           S3     88      CF
3           S4     89      CB
4           S5     90      EA


In [None]:
# Assuming we have a CSV file named 'data.csv' with columns: Name, Age, City
df = pd.read_csv('data.csv')
print(df.head())  # Display first few rows

In [None]:
myData = pd.read_csv('NewData.csv')
print(myData.head())

    Name  Age      City
0  Rahul   32   Mathura
1    Raj   30  New York
2    Ram   35   Lucknow


In [None]:
import pandas as pd
myDetails = pd.read_csv('sample.csv')
print(myDetails.head())

      Name\tAge\tCity
0  Rahul\t32\tMathura
1   Raj\t30\tNew York
2    Ram\t35\tLucknow


In [None]:
import pandas as pd
myDetails = pd.read_csv('NewData.csv')
print(myDetails.head())

    Name  Age      City
0  Rahul   32   Mathura
1    Raj   30  New York
2    Ram   35   Lucknow


In [None]:
import pandas as pd
PersonData = pd.read_csv('NewData.csv')
print(PersonData.head())

    Name  Age      City
0  Rahul   32   Mathura
1    Raj   30  New York
2    Ram   35   Lucknow
3  Name1   40     City1
4  Name2   43     City2


In [None]:
import pandas as GLA
StudData = GLA.read_csv('NewData.csv')
print(StudData.head())

    Name  Age      City
0  Rahul   32   Mathura
1    Raj   30  New York
2    Ram   35   Lucknow


In [None]:
import pandas as pd

# Step 1: Read the existing CSV file
existing_data = pd.read_csv('NewData.csv')

# Step 2: Create a new DataFrame with the additional data
# Replace 'Name1', 'Age1', 'City1' with the actual data you want to add
new_data = pd.DataFrame({
    'Name': ['Sachin', 'Virat'],  # Add more names as needed
    'Age': [40, 43],         # Add corresponding ages
    'City': ['Mumbai', 'Delhi']   # Add corresponding cities
})

# Step 3: Append the new DataFrame to the existing one using concat
updated_data = pd.concat([existing_data, new_data], ignore_index=True)
#ignore_index=True, the index values of the appended rows will be ignored, and
#pandas will automatically assign a new index to the resulting DataFrame,
#starting from 0 and continuing sequentially.

# Step 4: Save the updated DataFrame back to the CSV file
updated_data.to_csv('NewData.csv', index=False)
#the index parameter specifies whether the index of the DataFrame should be written to the CSV file.
#If index=False, the index will not be included in the CSV file,
#resulting in a file that contains only the data columns.

In [None]:
import pandas as pd

# Read the existing CSV file
data = pd.read_csv('NewData.csv')

# Calculate the sum of the 'Age' column
age_sum = data['Age'].sum()
print(f"Total Sum of Ages: {age_sum}")

Total Sum of Ages: 180


In [None]:
import pandas as pd

# Read the existing CSV file
data = pd.read_csv('NewData.csv')

# Calculate the mean of the 'Age' column
age_mean = data['Age'].mean()
print(f"Mean Age: {age_mean}")

Mean Age: 36.0


In [None]:
import pandas as pd

# Read the existing CSV file
data = pd.read_csv('NewData.csv')

# Count the number of non-NA/null entries in the 'Age' column
age_count = data['Age'].count()
print(f"Count of Ages: {age_count}")

Count of Ages: 5


#Additional Examples
Some additional examples of common statistical and data manipulation operations that you can perform:

1. **Calculating the Median of a Column**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Calculate the median of the 'Age' column
   age_median = data['Age'].median()
   print(f"Median Age: {age_median}")
   ```

2. **Calculating the Standard Deviation of a Column**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Calculate the standard deviation of the 'Age' column
   age_std = data['Age'].std()
   print(f"Standard Deviation of Age: {age_std}")
   ```

3. **Calculating the Minimum and Maximum of a Column**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Calculate the minimum age
   age_min = data['Age'].min()
   print(f"Minimum Age: {age_min}")

   # Calculate the maximum age
   age_max = data['Age'].max()
   print(f"Maximum Age: {age_max}")
   ```

4. **Calculating the Variance of a Column**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Calculate the variance of the 'Age' column
   age_variance = data['Age'].var()
   print(f"Variance of Age: {age_variance}")
   ```

   The variance of a column in a dataset is a statistical measure that quantifies the spread or dispersion of the values in that column. It is calculated as the average squared deviation of each number from the mean of the column. In other words, it measures how far each number in the set is from the mean and thus from every other number in the set.

5. **Calculating Quantiles of a Column**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Calculate the 25th and 75th percentiles of the 'Age' column
   age_25th_quantile = data['Age'].quantile(0.25)
   age_75th_quantile = data['Age'].quantile(0.75)
   print(f"25th Percentile of Age: {age_25th_quantile}")
   print(f"75th Percentile of Age: {age_75th_quantile}")
   ```

6. **Grouping Data and Calculating Group-wise Statistics**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Group by 'Gender' and calculate mean age within each group
   mean_age_by_gender = data.groupby('Gender')['Age'].mean()
   print(f"Mean Age by Gender:\n{mean_age_by_gender}")
   ```

7. **Finding Unique Values in a Column**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Find unique values in the 'Gender' column
   unique_genders = data['Gender'].unique()
   print(f"Unique Genders: {unique_genders}")
   ```

8. **Filtering Data Based on a Condition**:
   ```python
   import pandas as pd

   # Read the existing CSV file
   data = pd.read_csv('NewData.csv')

   # Filter rows where 'Age' is greater than 30
   over_30 = data[data['Age'] > 30]
   print(f"Data for Ages Over 30:\n{over_30.head()}")
   ```