
# Limitations of File Handling in Python for Structured Data

<b>Memory Usage:</b>

Large Files: Loading large files can cause memory issues; use chunking to mitigate this.

Inefficient Memory Management: Reading entire files at once is inefficient.

<b>Performance:</b>

Slow I/O Operations: Reading/writing large datasets can be slow.

Inefficient Operations: Non-optimized operations can degrade performance.

<b>Data Corruption:</b>

Partial Writes: Interruption during writing can corrupt files.

Encoding Issues: Mismanaged encodings can corrupt data.

<b>Concurrency:</b>

File Locking: Concurrent operations require complex file locking.

Thread Safety: Ensuring thread-safe file operations is challenging.

<b>Data Types and Formatting:</b>

Data Type Inference: Incorrect inference can cause processing issues.

Date and Time Formatting: Parsing inconsistent formats is error-prone.

Special Characters: Handling delimiters and special characters can be tricky.

<b>Error Handling:</b>

File Not Found: Missing files or incorrect paths need to be managed.

Read/Write Errors: Effectively handling read/write errors is crucial.

<b>Library Limitations:</b>

Dependency on External Libraries: External libraries can have limitations or bugs.

Compatibility: Ensuring compatibility with different library versions is necessary.

<b>Security:</b>

Injection Attacks: File path handling needs to prevent security vulnerabilities.

Sensitive Data: Secure handling of sensitive data during storage/sharing.

<b>Cross-Platform Issues:</b>

Path Differences: Different OS path handling can cause issues.

Line Endings: Differences in newline characters between Unix and Windows can cause problems.

In [2]:
import csv

In [3]:
with open("C:/Users/LENOVO/Documents/ML Project/Airline.csv","r",newline ='') as cvfile:
    reader = csv.reader(cvfile)
    for row in reader:
        print(row)

['id', 'airline', 'flight', 'source_city', 'departure_time', 'stops', 'arrival_time', 'destination_city', 'class', 'duration', 'days_left', 'price']
['0', 'SpiceJet', 'SG-8709', 'Delhi', 'Evening', 'zero', 'Night', 'Mumbai', 'Economy', '2.17', '1', '5953']
['1', 'SpiceJet', 'SG-8157', 'Delhi', 'Early_Morning', 'zero', 'Morning', 'Mumbai', 'Economy', '2.33', '1', '5953']
['2', 'AirAsia', 'I5-764', 'Delhi', 'Early_Morning', 'zero', 'Early_Morning', 'Mumbai', 'Economy', '2.17', '1', '5956']
['3', 'Vistara', 'UK-995', 'Delhi', 'Morning', 'zero', 'Afternoon', 'Mumbai', 'Economy', '2.25', '1', '5955']
['4', 'Vistara', 'UK-963', 'Delhi', 'Morning', 'zero', 'Morning', 'Mumbai', 'Economy', '2.33', '1', '5955']
['5', 'Vistara', 'UK-945', 'Delhi', 'Morning', 'zero', 'Afternoon', 'Mumbai', 'Economy', '2.33', '1', '5955']
['6', 'Vistara', 'UK-927', 'Delhi', 'Morning', 'zero', 'Morning', 'Mumbai', 'Economy', '2.08', '1', '6060']
['7', 'Vistara', 'UK-951', 'Delhi', 'Afternoon', 'zero', 'Evening', 'Mu

['6528', 'Vistara', 'UK-829', 'Delhi', 'Early_Morning', 'one', 'Night', 'Mumbai', 'Economy', '15.5', '33', '5227']
['6529', 'Vistara', 'UK-871', 'Delhi', 'Night', 'one', 'Afternoon', 'Mumbai', 'Economy', '17.92', '33', '5227']
['6530', 'Vistara', 'UK-899', 'Delhi', 'Afternoon', 'one', 'Morning', 'Mumbai', 'Economy', '19.25', '33', '5227']
['6531', 'Vistara', 'UK-859', 'Delhi', 'Morning', 'one', 'Morning', 'Mumbai', 'Economy', '23.58', '33', '5227']
['6532', 'Vistara', 'UK-833', 'Delhi', 'Early_Morning', 'one', 'Afternoon', 'Mumbai', 'Economy', '7.33', '33', '5321']
['6533', 'Vistara', 'UK-833', 'Delhi', 'Early_Morning', 'one', 'Night', 'Mumbai', 'Economy', '15.42', '33', '5321']
['6534', 'Air_India', 'AI-9887', 'Delhi', 'Early_Morning', 'one', 'Night', 'Mumbai', 'Economy', '15.17', '33', '5178']
['6535', 'Vistara', 'UK-879', 'Delhi', 'Evening', 'one', 'Night', 'Mumbai', 'Economy', '5.33', '33', '5542']
['6536', 'Vistara', 'UK-879', 'Delhi', 'Evening', 'one', 'Morning', 'Mumbai', 'Econo

['12797', 'GO_FIRST', 'G8-2513', 'Delhi', 'Evening', 'one', 'Late_Night', 'Bangalore', 'Economy', '7.5', '16', '4205']
['12798', 'Air_India', 'AI-441', 'Delhi', 'Evening', 'one', 'Evening', 'Bangalore', 'Economy', '24.67', '16', '4230']
['12799', 'Air_India', 'AI-636', 'Delhi', 'Afternoon', 'one', 'Evening', 'Bangalore', 'Economy', '28', '16', '4230']
['12800', 'Indigo', '6E-2005', 'Delhi', 'Morning', 'one', 'Evening', 'Bangalore', 'Economy', '8.83', '16', '4249']
['12801', 'Indigo', '6E-2005', 'Delhi', 'Morning', 'one', 'Evening', 'Bangalore', 'Economy', '9.92', '16', '4249']
['12802', 'SpiceJet', 'SG-5012', 'Delhi', 'Early_Morning', 'zero', 'Early_Morning', 'Bangalore', 'Economy', '2.5', '16', '4721']
['12803', 'SpiceJet', 'SG-191', 'Delhi', 'Early_Morning', 'zero', 'Morning', 'Bangalore', 'Economy', '2.5', '16', '4721']
['12804', 'SpiceJet', 'SG-5007', 'Delhi', 'Evening', 'zero', 'Night', 'Bangalore', 'Economy', '2.75', '16', '4721']
['12805', 'SpiceJet', 'SG-143', 'Delhi', 'Evening

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



['109053', 'Air_India', 'AI-503', 'Bangalore', 'Evening', 'one', 'Morning', 'Kolkata', 'Economy', '17.25', '45', '6583']
['109054', 'Air_India', 'AI-501', 'Bangalore', 'Afternoon', 'one', 'Morning', 'Kolkata', 'Economy', '21.75', '45', '6583']
['109055', 'Vistara', 'UK-897', 'Bangalore', 'Early_Morning', 'two_or_more', 'Night', 'Kolkata', 'Economy', '14.83', '45', '6956']
['109056', 'Vistara', 'UK-897', 'Bangalore', 'Early_Morning', 'two_or_more', 'Evening', 'Kolkata', 'Economy', '10.17', '45', '7106']
['109057', 'Vistara', 'UK-897', 'Bangalore', 'Early_Morning', 'two_or_more', 'Night', 'Kolkata', 'Economy', '14.83', '45', '7106']
['109058', 'Vistara', 'UK-893', 'Bangalore', 'Evening', 'two_or_more', 'Evening', 'Kolkata', 'Economy', '22.17', '45', '7106']
['109059', 'Vistara', 'UK-897', 'Bangalore', 'Early_Morning', 'two_or_more', 'Evening', 'Kolkata', 'Economy', '11.67', '45', '7166']
['109060', 'Vistara', 'UK-897', 'Bangalore', 'Early_Morning', 'two_or_more', 'Evening', 'Kolkata', 'E

['112067', 'GO_FIRST', 'G8-7559', 'Bangalore', 'Afternoon', 'one', 'Late_Night', 'Hyderabad', 'Economy', '9.58', '22', '5255']
['112068', 'Vistara', 'UK-858', 'Bangalore', 'Early_Morning', 'one', 'Evening', 'Hyderabad', 'Economy', '10.33', '22', '5871']
['112069', 'Air_India', 'AI-738', 'Bangalore', 'Morning', 'one', 'Afternoon', 'Hyderabad', 'Economy', '24.58', '22', '5586']
['112070', 'Indigo', '6E-357', 'Bangalore', 'Night', 'one', 'Early_Morning', 'Hyderabad', 'Economy', '8.83', '22', '5599']
['112071', 'Air_India', 'AI-9505', 'Bangalore', 'Morning', 'one', 'Night', 'Hyderabad', 'Economy', '12.08', '22', '5603']
['112072', 'Air_India', 'AI-505', 'Bangalore', 'Morning', 'one', 'Evening', 'Hyderabad', 'Economy', '8.92', '22', '5780']
['112073', 'Air_India', 'AI-505', 'Bangalore', 'Morning', 'one', 'Night', 'Hyderabad', 'Economy', '13.17', '22', '5780']
['112074', 'Air_India', 'AI-503', 'Bangalore', 'Evening', 'one', 'Morning', 'Hyderabad', 'Economy', '16.67', '22', '5780']
['112075',

['115450', 'Indigo', '6E-7223', 'Bangalore', 'Afternoon', 'one', 'Evening', 'Hyderabad', 'Economy', '4.67', '49', '1694']
['115451', 'Indigo', '6E-841', 'Bangalore', 'Afternoon', 'one', 'Evening', 'Hyderabad', 'Economy', '4.67', '49', '1694']
['115452', 'Indigo', '6E-7257', 'Bangalore', 'Morning', 'one', 'Afternoon', 'Hyderabad', 'Economy', '5', '49', '1694']
['115453', 'Indigo', '6E-6012', 'Bangalore', 'Afternoon', 'one', 'Night', 'Hyderabad', 'Economy', '5.08', '49', '1694']
['115454', 'Indigo', '6E-6825', 'Bangalore', 'Early_Morning', 'one', 'Afternoon', 'Hyderabad', 'Economy', '5.17', '49', '1694']
['115455', 'Indigo', '6E-356', 'Bangalore', 'Evening', 'one', 'Night', 'Hyderabad', 'Economy', '5.33', '49', '1694']
['115456', 'Indigo', '6E-6017', 'Bangalore', 'Evening', 'one', 'Night', 'Hyderabad', 'Economy', '5.83', '49', '1694']
['115457', 'Indigo', '6E-6269', 'Bangalore', 'Early_Morning', 'one', 'Afternoon', 'Hyderabad', 'Economy', '6.17', '49', '1694']
['115458', 'Indigo', '6E-84

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



['233246', 'Air_India', 'AI-619', 'Mumbai', 'Night', 'one', 'Afternoon', 'Bangalore', 'Business', '18.08', '39', '29094']
['233247', 'Vistara', 'UK-841', 'Mumbai', 'Morning', 'one', 'Afternoon', 'Bangalore', 'Business', '4.58', '39', '38926']
['233248', 'Air_India', 'AI-669', 'Mumbai', 'Morning', 'one', 'Afternoon', 'Bangalore', 'Business', '26.33', '39', '39547']
['233249', 'Vistara', 'UK-653', 'Mumbai', 'Early_Morning', 'one', 'Evening', 'Bangalore', 'Business', '12.25', '39', '48213']
['233250', 'Vistara', 'UK-651', 'Mumbai', 'Evening', 'one', 'Evening', 'Bangalore', 'Business', '26.17', '39', '48213']
['233251', 'Vistara', 'UK-954', 'Mumbai', 'Early_Morning', 'one', 'Morning', 'Bangalore', 'Business', '5.83', '39', '54608']
['233252', 'Vistara', 'UK-958', 'Mumbai', 'Afternoon', 'one', 'Evening', 'Bangalore', 'Business', '6.42', '39', '54608']
['233253', 'Vistara', 'UK-958', 'Mumbai', 'Afternoon', 'one', 'Night', 'Bangalore', 'Business', '8', '39', '54608']
['233254', 'Vistara', 'UK

['242498', 'Air_India', 'AI-635', 'Mumbai', 'Early_Morning', 'one', 'Night', 'Chennai', 'Business', '16.5', '16', '49613']
['242499', 'Air_India', 'AI-864', 'Mumbai', 'Early_Morning', 'one', 'Night', 'Chennai', 'Business', '16.58', '16', '49613']
['242500', 'Air_India', 'AI-687', 'Mumbai', 'Afternoon', 'one', 'Morning', 'Chennai', 'Business', '16.92', '16', '49613']
['242501', 'Air_India', 'AI-442', 'Mumbai', 'Afternoon', 'one', 'Morning', 'Chennai', 'Business', '17.33', '16', '49613']
['242502', 'Air_India', 'AI-888', 'Mumbai', 'Evening', 'one', 'Afternoon', 'Chennai', 'Business', '17.67', '16', '49613']
['242503', 'Air_India', 'AI-660', 'Mumbai', 'Evening', 'one', 'Afternoon', 'Chennai', 'Business', '18.67', '16', '49613']
['242504', 'Air_India', 'AI-867', 'Mumbai', 'Night', 'one', 'Evening', 'Chennai', 'Business', '21.5', '16', '49613']
['242505', 'Air_India', 'AI-888', 'Mumbai', 'Evening', 'one', 'Evening', 'Chennai', 'Business', '24', '16', '49613']
['242506', 'Air_India', 'AI-660

['249726', 'Air_India', 'AI-808', 'Bangalore', 'Night', 'one', 'Morning', 'Mumbai', 'Business', '12.08', '14', '62625']
['249727', 'Air_India', 'AI-503', 'Bangalore', 'Evening', 'one', 'Morning', 'Mumbai', 'Business', '15.25', '14', '62625']
['249728', 'Vistara', 'UK-820', 'Bangalore', 'Evening', 'one', 'Night', 'Mumbai', 'Business', '6', '14', '67004']
['249729', 'Vistara', 'UK-816', 'Bangalore', 'Morning', 'one', 'Evening', 'Mumbai', 'Business', '6.17', '14', '67004']
['249730', 'Vistara', 'UK-808', 'Bangalore', 'Early_Morning', 'one', 'Afternoon', 'Mumbai', 'Business', '7', '14', '67004']
['249731', 'Vistara', 'UK-812', 'Bangalore', 'Morning', 'one', 'Evening', 'Mumbai', 'Business', '7.92', '14', '67004']
['249732', 'Vistara', 'UK-810', 'Bangalore', 'Early_Morning', 'one', 'Afternoon', 'Mumbai', 'Business', '8', '14', '67004']
['249733', 'Vistara', 'UK-808', 'Bangalore', 'Early_Morning', 'one', 'Evening', 'Mumbai', 'Business', '9.67', '14', '67004']
['249734', 'Vistara', 'UK-810', '

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [4]:
reader['stops']

TypeError: '_csv.reader' object is not subscriptable

# Pandas


Pandas is an open-source Python library for data manipulation, analysis, and cleaning, known for its ease of use and high performance in handling structured data.

<b>Data Structures:</b>

Series: A one-dimensional array-like object for time series or single-column data.
DataFrame: A two-dimensional labeled table, ideal for structured data like CSV files or SQL tables.

<b>Data Input/Output:</b>

Supports various formats including CSV, Excel, and SQL, for easy data reading and writing.

<b>Data Cleaning:</b>

Offers methods for handling missing data, removing duplicates, and converting data types.

<b>Data Selection and Filtering:</b>

Provides tools for selecting and filtering specific rows and columns from DataFrames.

<b>Data Manipulation:</b>

Enables transformations such as merging, reshaping, and aggregating data for analysis.

<b>Time Series Analysis:</b>

Includes functions for handling and analyzing time series data.

<b>Statistical Analysis:</b>

Offers statistical functions like mean, median, and standard deviation for data analysis.

<b>Visualization:</b>

Integrates with libraries like Matplotlib and Seaborn for data visualization.

<b>Integration with NumPy:</b>

Built on top of NumPy, allowing seamless integration with NumPy arrays for numerical computing.






In [None]:
# installing pandas

In [None]:
pip install pandas 

In [6]:
import pandas as pd

# Series

A Series is like a one-dimensional list or array that can hold data. Think of it as a single column in an Excel spreadsheet or a single list of numbers. It's a fundamental building block for working with data in Pandas. 

Homogeneous Data: A Series can store data of the same data type, like all numbers, all text, or all dates. For example, you can have a Series of numbers representing daily temperatures.

Labels: Each item (or element) in a Series has a label called an index. This index can be numeric or text-based, and it helps you quickly locate and retrieve specific data points.

Size: A Series has a fixed length, which means it can hold a specific number of elements.

Operations: You can perform various operations on a Series, like arithmetic calculations, filtering, and sorting. For example, you can find the average temperature, filter out days when it rained, or sort the temperatures from lowest to highest.

Data Types: Series can store different types of data, including numbers (integers, floats), text (strings), and even more complex types like dates or custom objects.

Similar to a Dictionary: A Series is similar to a Python dictionary but with added functionality. You can think of it as a dictionary where the keys are the index and the values are the data.

In [12]:
i=[2,4,6,8,10,12,14]

data=pd.Series(i)

print(data,type(data))

0     2
1     4
2     6
3     8
4    10
5    12
6    14
dtype: int64 <class 'pandas.core.series.Series'>


data is an object of <class,pandas.core.series.Series>

pandas is a module name

Series() is a pre-define Function in pandas module and it is used for creating an object of Series class

Object can either list,ndarray,dict,....etc

Index represents the position of values present numbers of values in Series object.
dtype represents data type(EX:- int32,int64,float32,float64,...etc)

In [14]:
Fruits=["Apples","Bananas","Oranges","WaterMelon","Cherry","Mango","Kiwi","Grapes","Strawberry"]

In [15]:
ser=pd.Series(Fruits)
print(ser)

0        Apples
1       Bananas
2       Oranges
3    WaterMelon
4        Cherry
5         Mango
6          Kiwi
7        Grapes
8    Strawberry
dtype: object


In [16]:
ser[4]

'Cherry'

In [17]:
ser[7]

'Grapes'

In [18]:
ser[3:7]

3    WaterMelon
4        Cherry
5         Mango
6          Kiwi
dtype: object

In [20]:
ser

0        Apples
1       Bananas
2       Oranges
3    WaterMelon
4        Cherry
5         Mango
6          Kiwi
7        Grapes
8    Strawberry
dtype: object

In [21]:
ser[3:-2]

3    WaterMelon
4        Cherry
5         Mango
6          Kiwi
dtype: object

In [None]:
ser

In [22]:
ser[-5:-2] # end index-1 -2-1 = -3

4    Cherry
5     Mango
6      Kiwi
dtype: object

In [23]:
for i in ser:
    print(i)

Apples
Bananas
Oranges
WaterMelon
Cherry
Mango
Kiwi
Grapes
Strawberry


In [24]:
ser[3]="PineApple"

In [25]:
print(ser)

0        Apples
1       Bananas
2       Oranges
3     PineApple
4        Cherry
5         Mango
6          Kiwi
7        Grapes
8    Strawberry
dtype: object


In [26]:
ser[2:6]="Pomogranete"

In [27]:
print(ser)

0         Apples
1        Bananas
2    Pomogranete
3    Pomogranete
4    Pomogranete
5    Pomogranete
6           Kiwi
7         Grapes
8     Strawberry
dtype: object


In [28]:
b=[1,2,3,5,6,8,89]
var1=pd.Series(b,index=[9,8,7,6,5,4,3],dtype=float)
print(var1)

9     1.0
8     2.0
7     3.0
6     5.0
5     6.0
4     8.0
3    89.0
dtype: float64


In [33]:
var1[[9,5]]

9    1.0
5    6.0
dtype: float64

In [34]:
var1[5:8]

4     8.0
3    89.0
dtype: float64

In [35]:
print(var1*2)

9      2.0
8      4.0
7      6.0
6     10.0
5     12.0
4     16.0
3    178.0
dtype: float64


In [36]:
a=(12,3,4,5,6,12.5)
print(a,type(a))

(12, 3, 4, 5, 6, 12.5) <class 'tuple'>


In [37]:
rt=pd.Series(a)
print(rt)

0    12.0
1     3.0
2     4.0
3     5.0
4     6.0
5    12.5
dtype: float64


In [38]:
[rt>4]

[0     True
 1    False
 2    False
 3     True
 4     True
 5     True
 dtype: bool]

In [39]:
print(rt[rt>5])

0    12.0
4     6.0
5    12.5
dtype: float64


In [42]:
a={"One":1,"Two":2,"Three":3,"Four":4,"Five":5,"Six":6,"Seven":7,"Eight":8}

In [43]:
value=pd.Series(a)
print(value)

One      1
Two      2
Three    3
Four     4
Five     5
Six      6
Seven    7
Eight    8
dtype: int64


In [46]:
import numpy as np

a=np.array([[1,2,3],[4,5,"Bye"],["Hello",8,9]])

print(a*2)

UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U11'), dtype('int32')) -> None

In [45]:
b=[1,2,3,5,"Hello","Bye",6,8,89]
var1=pd.Series(b,index=[9,8,7,6,5,4,3,2,1])
print(var1)

9        1
8        2
7        3
6        5
5    Hello
4      Bye
3        6
2        8
1       89
dtype: object


In [47]:
print(var1*2)

9             2
8             4
7             6
6            10
5    HelloHello
4        ByeBye
3            12
2            16
1           178
dtype: object


In [49]:
value

One      1
Two      2
Three    3
Four     4
Five     5
Six      6
Seven    7
Eight    8
dtype: int64

In [48]:
print(value.mean())

4.5


In [50]:
print(value.max())

8


In [51]:
print(value.min())

1


In [52]:
print(value.std())

2.449489742783178


In [53]:
print(value+5)

One       6
Two       7
Three     8
Four      9
Five     10
Six      11
Seven    12
Eight    13
dtype: int64


In [56]:
print(value-6)

One     -5
Two     -4
Three   -3
Four    -2
Five    -1
Six      0
Seven    1
Eight    2
dtype: int64


In [57]:
value

One      1
Two      2
Three    3
Four     4
Five     5
Six      6
Seven    7
Eight    8
dtype: int64

In [58]:
print(value>5)

One      False
Two      False
Three    False
Four     False
Five     False
Six       True
Seven     True
Eight     True
dtype: bool


In [61]:
b=[67,54,23,99,76,58,37]
var2=pd.Series(b,index=[9,8,7,6,5,4,3],dtype=float)
print(var2)

9    67.0
8    54.0
7    23.0
6    99.0
5    76.0
4    58.0
3    37.0
dtype: float64


In [62]:
print(var2.sort_values())

7    23.0
3    37.0
8    54.0
4    58.0
9    67.0
5    76.0
6    99.0
dtype: float64


In [65]:
print(var2.nunique())

7


In [64]:
print(var2.unique())

[67. 54. 23. 99. 76. 58. 37.]


In [66]:
print(len(var1))

7


In [71]:
a=[1,2,None,4,5,None,7,8,9]

In [72]:
b=pd.Series(a)

print(b)

0    1.0
1    2.0
2    NaN
3    4.0
4    5.0
5    NaN
6    7.0
7    8.0
8    9.0
dtype: float64


In [73]:
print(b.isnull())

0    False
1    False
2     True
3    False
4    False
5     True
6    False
7    False
8    False
dtype: bool


In [74]:
print(b.isnull().sum())

2


In [75]:
b.mean()

5.142857142857143

In [76]:
print(b.fillna(b.mean()))

0    1.000000
1    2.000000
2    5.142857
3    4.000000
4    5.000000
5    5.142857
6    7.000000
7    8.000000
8    9.000000
dtype: float64


In [79]:
def square(x):
    return x**2

In [78]:
a=[1,2,3,4,5,6,6,7]


b=pd.Series(a)

print(b)

0    1
1    2
2    3
3    4
4    5
5    6
6    6
7    7
dtype: int64


In [80]:
print(b.apply(square))

0     1
1     4
2     9
3    16
4    25
5    36
6    36
7    49
dtype: int64


In [82]:
s=['12-03-2024','03-04-2023','06-05-2021']

d=pd.Series(s)

print(d)

0    12-03-2024
1    03-04-2023
2    06-05-2021
dtype: object


In [83]:
v=pd.to_datetime(s)

In [84]:
print(v)

DatetimeIndex(['2024-12-03', '2023-03-04', '2021-06-05'], dtype='datetime64[ns]', freq=None)


In [86]:
v.day_of_year

Int64Index([338, 63, 156], dtype='int64')

In [87]:
v.year

Int64Index([2024, 2023, 2021], dtype='int64')

In [None]:
  0 1 2
0 1 2 3
1 4 5 6
2 7 8 9

In [69]:
a=[[1,2,3],[4,5,6],[7,8,9]]

b=pd.Series(a)

print(b)

0    [1, 2, 3]
1    [4, 5, 6]
2    [7, 8, 9]
dtype: object


In [70]:
a={1,2,4,5,6,7,8,9}

b=pd.Series(a)

print(b)

TypeError: 'set' type is unordered

In [2]:
import pandas as pd

In [4]:
series = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
filtered_series = series[series > 5]
print(filtered_series)


5    6
6    7
7    8
8    9
dtype: int64


In [5]:
series = pd.Series(['dog', 'cat', 'elephant', 'rabbit'])
print(series.str.upper())


0         DOG
1         CAT
2    ELEPHANT
3      RABBIT
dtype: object


In [6]:
series = pd.Series([1, 2, 3, 4, 5])
print(series.cumsum())


0     1
1     3
2     6
3    10
4    15
dtype: int64


In [7]:
series = pd.Series([1, 2, 3, 4, 5])
print(series.replace({1: 'a', 5: 'e'}))


0    a
1    2
2    3
3    4
4    e
dtype: object


In [8]:
series = pd.Series([1, 2, 2, 3, 4, 4, 5])
print(series.unique())


[1 2 3 4 5]


In [9]:
series = pd.Series([1, 2, 3, 4, 5])
print(series.describe())


count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64


In [10]:
series = pd.Series([1, 1, 2, 2, 2, 3, 4])
print(series.value_counts())


2    3
1    2
3    1
4    1
dtype: int64


In [11]:
series = pd.Series([1, 2, 3, 4, 5])
print(series.shift(1))


0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64


In [12]:
series = pd.Series([100, 300, 200, 400, 100])
print(series.rank())


0    1.5
1    4.0
2    3.0
3    5.0
4    1.5
dtype: float64


In [13]:
series = pd.Series([1, 2, None, 4, 5, None, 7])
print(series.interpolate())


0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
dtype: float64


In [14]:
series = pd.Series([1, 1, 2, 3, 3, 4])
print(series.duplicated())


0    False
1     True
2    False
3    False
4     True
5    False
dtype: bool


In [15]:
series = pd.Series([1, 2, 3, 4, 5])
print(series.idxmax())


4


In [16]:
series = pd.Series([1, 2, 3, 4, 5])
print(series.idxmin())


0


In [17]:
series = pd.Series([1, 2, 2, 3, 4, 4, 4])
print(series.mode())


0    4
dtype: int64


In [18]:
series = pd.Series([1, None, 3, None, 5])
print(series.ffill())


0    1.0
1    1.0
2    3.0
3    3.0
4    5.0
dtype: float64


In [19]:
series = pd.Series([1, None, 3, None, 5])
print(series.bfill())


0    1.0
1    3.0
2    3.0
3    5.0
4    5.0
dtype: float64


In [20]:
series = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
reindexed_series = series.reindex(['a', 'b', 'c', 'd'])
print(reindexed_series)


a    1.0
b    2.0
c    3.0
d    NaN
dtype: float64
