1. How do you read only specific columns from a CSV?

**ans** - To read specific columns from a CSV file use **usecols** parameter in read_csv function
- syntax : pd.read_csv(filepath,usecols = column_name)

In [4]:
import pandas as pd
df = pd.read_csv("C:\\Users\\mohin\\Downloads\\student_dataset.csv")

In [6]:
df

Unnamed: 0,StudentID,Name,Gender,Course,Age,Marks,Grade,City
0,2001,Isha,Female,Data Science,27,82,F,Bangalore
1,2002,Aarav,Female,Data Science,27,42,A,Hyderabad
2,2003,Vivaan,Male,Web Development,24,98,B,Kolkata
3,2004,Sanya,Female,Web Development,23,52,A,Hyderabad
4,2005,Anaya,Female,Web Development,28,51,A,Hyderabad
...,...,...,...,...,...,...,...,...
145,2146,Anaya,Male,Cloud Computing,19,44,B,Bangalore
146,2147,Meera,Female,AI & ML,18,57,D,Hyderabad
147,2148,Kabir,Female,Cloud Computing,21,92,F,Bangalore
148,2149,Sanya,Female,Cyber Security,26,60,D,Delhi


In [16]:
# here we read specific column which is 'name'
df = pd.read_csv("C:\\Users\\mohin\\Downloads\\student_dataset.csv",usecols = [1]) # you can use '[column1]' rather than using [1]
df

Unnamed: 0,Name
0,Isha
1,Aarav
2,Vivaan
3,Sanya
4,Anaya
...,...
145,Anaya
146,Meera
147,Kabir
148,Sanya


---------------------------------------------------------------------------------------------------------------------------------------------------------

2. What is the purpose of .eval() in pandas ? When would you use it?

**ans** - .eval() method is used to evaluate a string as a Python expression dataframe or Series. It allows you to perform complex operations on your data using a string-based syntax.

- when to use .eval() function
    - when you need to perform complex filtering or calculations on your data.
    - when you want to avoid creating intermediate results and improve performance.

In [20]:
df = pd.DataFrame({
    'A':[11,22,33],
    'B':[44,55,66]
})
df

Unnamed: 0,A,B
0,11,44
1,22,55
2,33,66


In [24]:
df.eval('A+B')

0    55
1    77
2    99
dtype: int64

---------------------------------------------------------------------------------------------------------------------------------------------------------

3. What is the difference between str.replace() and str.extract()? Explain along with example.

**ans** - 
1. **str.replace()**
- replaces a specified substring with another substring in a string column
- returns a new series with the replaced values.

In [27]:
df = pd.DataFrame({
    'text':['Good morning','Good night']
})
df

Unnamed: 0,text
0,Good morning
1,Good night


In [31]:
res = df['text'].str.replace('Good','Happy')
print(res)

0    Happy morning
1      Happy night
Name: text, dtype: object


2. **str.extract()**
- extracts a specified pattern from a string column using regular expressions.
- returns a new series or dataframe with the extracted values.

---------------------------------------------------------------------------------------------------------------------------------------------------------

4. How to handle Time Series data in Pandas  ?  explain all the methods.

**ans** - pandas provide several methods to handle time series data as follows,
1. **creating a time series** - you can create a time series by specifying a date range and values.

In [39]:
date_range = pd.date_range('2025-01-01','2025-01-10')
date_range

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
               '2025-01-05', '2025-01-06', '2025-01-07', '2025-01-08',
               '2025-01-09', '2025-01-10'],
              dtype='datetime64[ns]', freq='D')

2. **Converting strings to DateTime** - you can convert string columns to datetime format using pd.to _datetime()
- df['date_range'] = pd.to_datetime(df['date_range'])

3. **Setting DateTime index** - you can set a datetime column as the index of a Dateframe
- df.set_index('date',inplace = True)

4. **Resampling window calculations** - you can perform rolling window calculations using **rolling()**
- ts.rolling(window = 3).mean - will calculate rolling mean with a window size of 3 

5. **Resampling time series** - you can resample a time series to a different frequency using **resample()**
- ts.resample('M').mean() - will resample to monthly frequency and calculate mean

6. **Shifting time series** - can shift a time series by a specified number of periods using **shift()**
- ts.shift(1) - will shift the time series by 1 period

7. **Time series plotting** - can plot time series data using **plot()**
- ts.plot()

8. **Handling missing values** - can handle missing values in time series data using **fillna()** or **interpolate()**
- ts.fillna(ts.mean) - fill missing values with mean
- ts.interpolate() - interpolate missing values

9. **Time series decomposition** - can decomposes a time series into trend, seasonality and residuals using **seasonal_decompose()**


In [51]:
"""
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts)
"""

'\nfrom statsmodels.tsa.seasonal import seasonal_decompose\ndecomposition = seasonal_decompose(ts)\n'

10. **Time series forecasting** 

--------------------------------------------------------------------------------------------------------------------------------------------------------

5. Explain methods related to date and time.

**ans** - 
1. pd.to_datetime():
- converts a string or other format to a datetime object.

In [59]:
import pandas as pd
date_string = '2023-01-01'
date_object = pd.to_datetime(date_string)
date_object

Timestamp('2023-01-01 00:00:00')

2. dt.date:
- extracts the date part from a datetime object
- df['date'] = df['datetime'].dt.date

3. dt.time:
- extracts the time part from a datetime object.
- df['time'] = df['datetime'].dt.time

4. dt.year,dt.month,dt.day
- extracts the year, month or day from datetime object.

5. dt.hour,dt.minute,dt.second
- extracts the hour,minute or second from a datetime object

6. dt.weekday
- returns the day of the week as an integer(monday = 0 and sunday = 6)

7. dt.isocalender
- returns the ISO year, week and weekday

8. pd.date_range()


In [72]:
pd.date_range('2023-01-01','2023-01-31')

DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
               '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08',
               '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12',
               '2023-01-13', '2023-01-14', '2023-01-15', '2023-01-16',
               '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20',
               '2023-01-21', '2023-01-22', '2023-01-23', '2023-01-24',
               '2023-01-25', '2023-01-26', '2023-01-27', '2023-01-28',
               '2023-01-29', '2023-01-30', '2023-01-31'],
              dtype='datetime64[ns]', freq='D')

9. pd.period_range()- creates a period range with a specified frequency.

In [86]:

"""
M - monthly frequency
D - daily frequency
W - weekly frequency
Q - quarterly frequency
A - annual frequency
"""
period_range = pd.period_range('2023-01','2023-12',freq = 'M')
period_range

PeriodIndex(['2023-01', '2023-02', '2023-03', '2023-04', '2023-05', '2023-06',
             '2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12'],
            dtype='period[M]')

---------------------------------------------------------------------------------------------------------------------------------------------------------

6. What is the purpose of multiIndex in pandas?

**ans** - 
- it is also known as hierarchical index
- **purpose**
    - enable data to be organized and analyzed by multiple categories or dimensions.
    - provide a way to represent complex data relationships and hierarchies.
    - allow for more efficient and flexible data analysis and manipulation

In [92]:
data = {'value':[10,20,30,40,50]}
index = pd.MultiIndex.from_tuples([('A',1),('A',2),('B',1),('B',2),('C',1)], names = ['letter','number'])
df = pd.DataFrame(data,index = index)
print(df)

               value
letter number       
A      1          10
       2          20
B      1          30
       2          40
C      1          50


---------------------------------------------------------------------------------------------------------------------------------------------------------

7. What is the difference between copy() and view()?

**ans** :
- copy() creates a deep copy of the original dataframe or series  while view() creates a shallow copy of the original dataframe or series
- copy() allocates new memory for the copied data while view() does not allocate new memory for the data.

---------------------------------------------------------------------------------------------------------------------------------------------------------

8. what is method chaining in pandas ?

**ans** - method chaining is a programming technique in pandas where multiple methods are called on an object in a single line of code,
- each method returns the object itself, allowing the next method to be called on the same object.

---------------------------------------------------------------------------------------------------------------------------------------------------------

9. what is the difference between shift() and diff() ?

**ans**:
- shift() shifts the values in a series or dataframe by a specified number of periods 
- diff() calculates the difference between consecutive values in a series or dataframes.

---------------------------------------------------------------------------------------------------------------------------------------------------------

10. How would you handle a dataset with mixed timezones?

**ans**:
1. identify timezone information - you can use the **dt.tz** attribute to check if a datetime column has timezone information.
2. convert to a standard timezone - dt.tz_convert('UTC')
3. Handle ambiguous datetimes

---------------------------------------------------------------------------------------------------------------------------------------------------------



11. What is sparseDataFrame ? when would you use it?

**ans** : 
- A SparseDataFrame is a type of DataFrame in pandas that is optimized for storing and manipulating data with a large number of missing or null values  - SparseDataFrames are particularly useful when working with datasets that have a high proportion of zeros or NaN values.

- When to Use:

    - High-dimensional data: When working with high-dimensional data that has a large number of features, many of which may be zero or NaN.
    - Sparse matrices: When working with sparse matrices, such as those encountered in linear algebra or machine learning applications.
    - Large datasets: When working with large datasets that have a high proportion of missing or null values.

---------------------------------------------------------------------------------------------------------------------------------------------------------

12. How does pd.IndexSlice work with MultiIndex?

**ans** : 



---------------------------------------------------------------------------------------------------------------------------------------------------------

13. What is the purpose of at and iat? How do they differ from loc/iloc?

**ans** - 

**purpose**
- at and iat provide fast and efficient access to a single value in a DataFrame or Series.
- at and iat also allow you to modify a single value in a DataFrame or Series.

Difference from loc and iloc:

- **Single value access**: at and iat are designed for accessing and modifying single values, while loc and iloc can access multiple values or rows and columns.
- **Speed**: at and iat are faster than loc and iloc for accessing single values because they are optimized for this specific use case.
- **Syntax**: The syntax for at and iat is similar to loc and iloc, but at and iat require a single label or integer, while loc and iloc can accept multiple labels or integers.

---------------------------------------------------------------------------------------------------------------------------------------------------------

14. What is the difference between merge_asof() and regular merge()

**ans** : 
- **Match type**: merge() performs an exact match, while merge_asof() performs a nearest match based on the direction specified (backward, forward, or nearest).
- **Use case**: merge() is suitable for general data merging, while merge_asof() is specifically designed for merging time-series data or sequential data.

---------------------------------------------------------------------------------------------------------------------------------------------------------

15. What is pd.NA? How does it differ from np.nan?

**ans** : 
- **Propagation**: pd.NA propagates through operations, meaning that if you perform an operation involving pd.NA, the result will also be pd.NA. np.nan, on the other hand, can sometimes produce unexpected results when used in operations.
- **Type**: pd.NA is a pandas-specific missing value representation, while np.nan is a NumPy floating-point representation of Not a Number.
- **Behavior**: pd.NA is designed to behave more consistently and predictably than np.nan when used in pandas operations

---------------------------------------------------------------------------------------------------------------------------------------------------------

1. write a python program to check if a string is valid email address.

In [3]:
import re
def is_valid(mail):
    pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
    return bool(re.match(pattern,mail))
mail = input('enter your email id = ')
if is_valid(mail):
    print('valid email id')
else:
    print('please enter the valid email id')

enter your email id =  mohini@gmail.com


valid email id


---------------------------------------------------------------------------------------------------------------------------------------------------------

2. Write a Python program to find the longest common prefix among a list of strings.


In [12]:
def longest_prefix(string):
    if not string:
        return ""
    prefix = ""
    for chars in zip(*string):
        if len(set(chars))==1:
            prefix += chars[0]
        else:
            break
    return prefix

string = ['cat','carrom','can']
print(longest_prefix(string))

ca


---------------------------------------------------------------------------------------------------------------------------------------------------------

3. Given a string, create a new string with the same characters in a random order.


In [32]:
import random
def random_order(string):
    return ''.join(random.sample(string,len(string)))

main_string = 'mohini'
random_order_string = random_order(main_string)
print(f'Randomized order string : {random_order_string}')

Randomized order string : iimhno


---------------------------------------------------------------------------------------------------------------------------------------------------------

4. Implement a method to perform basic string compression using the counts of repeated characters. For ex: "aaaabbccc" -> "a4b2c3"


In [37]:
def compress(string):
    result = ""
    count = 1
    for i in range(1, len(string)):
        if string[i] == string[i - 1]:
            count += 1
        else:
            result += string[i - 1] + str(count)
            count = 1
    result += string[-1] + str(count)
    return result


main_string = "aaaabbccc"
compressed_string = compress(main_string)
print(compressed_string)  

a4b2c3


---------------------------------------------------------------------------------------------------------------------------------------------------------

5. Write a Python Program to Count Number of Uppercase and Lowercase Letters in a String

In [40]:
def count_case(s):
    uppercase_count = sum(1 for char in s if char.isupper())
    lowercase_count = sum(1 for char in s if char.islower())
    
    return uppercase_count, lowercase_count

# Example usage:
s = "i am MOHINI"
uppercase, lowercase = count_case(s)
print(f"Uppercase letters: {uppercase}")
print(f"Lowercase letters: {lowercase}")

Uppercase letters: 6
Lowercase letters: 3
