<a href="https://colab.research.google.com/github/RajuNaik29/Python-Libraries/blob/main/Pandas_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **About Pandas Library**

Pandas is a powerful and widely-used open-source library in Python for data manipulation and analysis. It provides easy-to-use data structures and functions to work with structured data. Below, are the key components and features of the Pandas library in detail:

1. **DataFrame**:

    - The DataFrame is the primary data structure in Pandas. It represents tabular data similar to a spreadsheet or SQL table.
    - DataFrame consists of rows and columns, where each column can have a different data type.
    - DataFrames can be created from various data sources such as CSV files, Excel files, SQL databases, Python dictionaries, and NumPy arrays.
    - DataFrames provide powerful methods for indexing, slicing, filtering, joining, merging, and reshaping data.

2. **Series**:

    - A Series is a one-dimensional labeled array that can hold data of any type.
    - Series are the building blocks of DataFrames. Each column in a DataFrame is a Series.
    - Series can be created from Python lists, NumPy arrays, or dictionaries.

3. **Indexing and Selection**:

    - Pandas provides various methods for indexing and selecting data from DataFrames and Series.
    - This includes label-based indexing (`loc`), integer-based indexing (`iloc`), boolean indexing, and attribute-based access.

4. **Data Cleaning and Preparation**:

    - Pandas offers functions for handling missing data, duplicate data, and data type conversion.
    - Methods like `dropna()`, `fillna()`, `drop_duplicates()`, `astype()`, etc., are commonly used for data cleaning.

5. **Data Manipulation**:

    - Pandas provides a wide range of functions for data manipulation, including filtering rows, selecting columns, sorting, grouping, aggregating, and pivoting.
    - Methods like `groupby()`, `merge()`, `concat()`, `pivot_table()`, `stack()`, `unstack()`, etc., are used for various data manipulation tasks.

6. **Data Visualization**:

    - Pandas integrates seamlessly with Matplotlib and other visualization libraries for data visualization.
    - DataFrames and Series have built-in plotting methods (`plot()`), which provide a quick way to create basic plots such as line plots, scatter plots, histograms, etc.

7. **Time Series Data**:

    - Pandas has extensive support for working with time series data.
    - It provides specialized data structures like `Timestamp` and `DatetimeIndex` for representing dates and times.
    - Pandas offers time series-specific functions for resampling, shifting, rolling window calculations, etc.

8. **Input/Output**:

    - Pandas supports reading and writing data from/to various file formats such as CSV, Excel, SQL databases, JSON, HTML, and HDF5.
    - Methods like `read_csv()`, `read_excel()`, `to_csv()`, `to_excel()`, etc., facilitate input/output operations.

9. **Performance and Efficiency**:

    - Pandas is designed for performance and efficiency, especially for handling large datasets.
    - It leverages optimized and vectorized operations using underlying libraries like NumPy, which makes it faster than traditional Python loops.

In summary, Pandas is a versatile library for data manipulation and analysis in Python, offering a wide range of functionality for working with structured data efficiently and effectively. Whether you're cleaning messy data, performing complex transformations, or conducting in-depth analysis, Pandas provides the tools you need to get the job done.

### **Why pandas instead of Numpy and what are limitations of Numpy?**

Pandas and NumPy are both popular libraries in Python for data manipulation and analysis, but they serve different purposes and have different strengths.

NumPy:

1. **Numeric Operations**: NumPy is optimized for numerical operations. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

2. **Low-Level**: NumPy provides low-level data structures, allowing for more control over memory usage and performance optimizations.

3. **Efficiency**: NumPy operations are typically faster than traditional Python operations due to its implementation in C.

4. **Array-based**: NumPy primarily focuses on array-based computation, making it ideal for numerical computing tasks.

However, NumPy has some limitations:

1. **Structured Data Handling**: NumPy lacks high-level data structures to handle structured data with mixed data types, such as tables or data frames.

2. **Data Manipulation**: NumPy doesn't have built-in functionalities for data manipulation tasks like grouping, pivoting, or joining datasets, which are common in data analysis workflows.

3. **Label-based Indexing**: NumPy arrays are indexed using integer indices, making it less intuitive for tasks that require label-based indexing, such as accessing data by column names.

Pandas:

1. **High-Level Data Structures**: Pandas provides high-level data structures like Series and DataFrame that are designed for handling structured data efficiently. These data structures are built on top of NumPy arrays, providing a more intuitive interface for data manipulation.

2. **Data Analysis Tools**: Pandas offers a wide range of tools for data analysis, including data cleaning, transformation, aggregation, and visualization.

3. **Label-based Indexing**: Pandas supports label-based indexing, allowing users to access data by column names or row indices, which makes data manipulation tasks more intuitive.

4. **Integration with Other Libraries**: Pandas integrates well with other Python libraries like Matplotlib, Scikit-learn, and Statsmodels, making it a powerful tool for data analysis and machine learning tasks.

In summary, while NumPy excels in numerical computation and array manipulation, Pandas is better suited for data manipulation and analysis tasks, especially when dealing with structured data. Combining both libraries allows for a comprehensive data analysis workflow in Python.

In [68]:
!pip install pandas



In [69]:
import numpy as np
import pandas as pd

In [70]:
#!wget or !gdown can be used for downloading the data.
!gdown "1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_"

#CSV = comma separated values.

Downloading...
From: https://drive.google.com/uc?id=1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_
To: /content/mckinsey.csv
  0% 0.00/83.8k [00:00<?, ?B/s]100% 83.8k/83.8k [00:00<00:00, 5.36MB/s]


In [71]:
df = pd.read_csv('mckinsey.csv')
df

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [72]:
print(type(df))

#df is a DataFrame.
#DataFrame is always a 2-d array

<class 'pandas.core.frame.DataFrame'>


In Pandas, both Series and DataFrame are fundamental data structures, but they serve different purposes and have different characteristics.

1. **Series**:

    - A Series is a one-dimensional labeled array capable of holding data of any type (integer, float, string, etc.).
    - It can be thought of as a column in a spreadsheet or a single variable in a dataset.
    - Each element in a Series has a unique label, called the index. The index allows for fast and efficient data retrieval.
    - Series provides various methods and attributes for data manipulation, such as indexing, slicing, arithmetic operations, etc.
    - Series is typically used for representing a single variable or a column in a dataset.

2. **DataFrame**:

    - A DataFrame is a two-dimensional labeled data structure with columns of potentially different data types.
    - It can be thought of as a table or a spreadsheet, where each column is a Series.
    - DataFrame provides a tabular representation of data, allowing for easy visualization and manipulation.
    - It has both row and column indices, allowing for label-based indexing and slicing along both dimensions.
    - DataFrame supports various operations for data manipulation, such as merging, joining, grouping, reshaping, etc.
    - DataFrame is suitable for handling structured data with multiple variables or features.

In summary, Series is a one-dimensional labeled array, while DataFrame is a two-dimensional labeled data structure resembling a table. Series represents a single column of data, whereas DataFrame represents a collection of columns, forming a dataset. Both Series and DataFrame are essential components of data analysis in Pandas, with DataFrame being more commonly used for handling structured datasets.

In [73]:
df['life_exp'] #This command will get only life_exp column as output. The output will be in the form of series.

0       28.801
1       30.332
2       31.997
3       34.020
4       36.088
         ...  
1699    62.351
1700    60.377
1701    46.809
1702    39.989
1703    43.487
Name: life_exp, Length: 1704, dtype: float64

In [74]:
print(type(df['life_exp']))

#This is a series(single column)

#series is like vector in numpy which means 1-d array.


<class 'pandas.core.series.Series'>


In [75]:
df['continent']

#This is an 1-d array.
#when multiple series are concatenated or stacked or added then it becomes dataframe.
# trick : When we hover over result table if the rows are not highlighted it is a Series or else it is a DataFrame.

0         Asia
1         Asia
2         Asia
3         Asia
4         Asia
         ...  
1699    Africa
1700    Africa
1701    Africa
1702    Africa
1703    Africa
Name: continent, Length: 1704, dtype: object

In [76]:
df['country']

#column names are case sensitive.

0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: country, Length: 1704, dtype: object

In [77]:
#df['Country'] # The error is due to the capital C in country as it is small c in dataframe.

In [78]:
df.head()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


In [79]:
df.shape

(1704, 6)

In [80]:
#when we want to see all the column names exclusively we use.

df.columns

Index(['country', 'year', 'population', 'continent', 'life_exp', 'gdp_cap'], dtype='object')

### **About property and function in pandas**

In Pandas, properties and functions play different roles in the manipulation and analysis of data. Let's define each:

1. **Property**:

    - A property in Pandas refers to an attribute or characteristic of a DataFrame or Series object.
    - Properties provide access to metadata or information about the data without executing any computation.
    - Examples of properties in Pandas include `shape`, `dtypes`, `index`, and `columns` for DataFrames, and `dtype`, `name`, `index`, and `values` for Series.
    - Properties are accessed without parentheses, similar to accessing attributes of an object in Python.
    - They provide information about the structure, size, and data types of the DataFrame or Series.

2. **Function**:

    - A function in Pandas refers to a callable object that performs a specific action or computation on a DataFrame or Series.
    - Functions in Pandas are typically used to transform, manipulate, analyze, or visualize data.
    - Functions can be methods of DataFrame or Series objects, or they can be standalone functions that accept DataFrame or Series objects as arguments.
    - Examples of functions in Pandas include `head()`, `tail()`, `describe()`, `info()`, `groupby()`, `apply()`, `plot()`, and many others.
    - Functions are invoked with parentheses, and they may accept arguments to customize their behavior.
    - They perform operations on the data, such as summarization, aggregation, filtering, sorting, merging, reshaping, and visualization.

In summary, properties in Pandas provide information about the data structure and metadata, while functions perform operations and computations on the data itself. Understanding the distinction between properties and functions is essential for effectively working with Pandas objects and analyzing data.

In [81]:
# when we want to find information about the dataframe we use .info().
# .info() is a function and where as .shape and .column are propeties.


df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     1704 non-null   object 
 1   year        1704 non-null   int64  
 2   population  1704 non-null   int64  
 3   continent   1704 non-null   object 
 4   life_exp    1704 non-null   float64
 5   gdp_cap     1704 non-null   float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB


In [82]:
# df.head will print first 5 rows of the table. it will give first 5 rows as default.
df.head()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


In [83]:
df.head(10) #To print first 10 rows

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
5,Afghanistan,1977,14880372,Asia,38.438,786.11336
6,Afghanistan,1982,12881816,Asia,39.854,978.011439
7,Afghanistan,1987,13867957,Asia,40.822,852.395945
8,Afghanistan,1992,16317921,Asia,41.674,649.341395
9,Afghanistan,1997,22227415,Asia,41.763,635.341351


In [84]:
#df.tail will print last 5 rows default.

df.tail()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.44996
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623
1703,Zimbabwe,2007,12311143,Africa,43.487,469.709298


In [85]:
df.tail(20) #To print last 20 rows

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
1684,Zambia,1972,4506497,Africa,50.107,1773.498265
1685,Zambia,1977,5216550,Africa,51.386,1588.688299
1686,Zambia,1982,6100407,Africa,51.821,1408.678565
1687,Zambia,1987,7272406,Africa,50.821,1213.315116
1688,Zambia,1992,8381163,Africa,46.1,1210.884633
1689,Zambia,1997,9417789,Africa,40.238,1071.353818
1690,Zambia,2002,10595811,Africa,39.193,1071.613938
1691,Zambia,2007,11746035,Africa,42.384,1271.211593
1692,Zimbabwe,1952,3080907,Africa,48.451,406.884115
1693,Zimbabwe,1957,3646340,Africa,50.469,518.764268


In [86]:
'''
describe() is used to generate descriptive statistics of the data in a Pandas DataFrame or Series.
It summarizes central tendency and dispersion of the dataset. describe() helps in getting a quick overview of the dataset
'''

df.describe()

Unnamed: 0,year,population,life_exp,gdp_cap
count,1704.0,1704.0,1704.0,1704.0
mean,1979.5,29601210.0,59.474439,7215.327081
std,17.26533,106157900.0,12.917107,9857.454543
min,1952.0,60011.0,23.599,241.165876
25%,1965.75,2793664.0,48.198,1202.060309
50%,1979.5,7023596.0,60.7125,3531.846988
75%,1993.25,19585220.0,70.8455,9325.462346
max,2007.0,1318683000.0,82.603,113523.1329


# Working with columns

In [87]:
df.columns # Will print column names exclusively.

Index(['country', 'year', 'population', 'continent', 'life_exp', 'gdp_cap'], dtype='object')

In [88]:
df['country'].head()
#This will print only country column first 5 rows and it is series(1-D array).

0    Afghanistan
1    Afghanistan
2    Afghanistan
3    Afghanistan
4    Afghanistan
Name: country, dtype: object

In [89]:
colms = df[['country','continent']]
colms

##This will print only country and continent columns and it is a dataframe(2-d array).

Unnamed: 0,country,continent
0,Afghanistan,Asia
1,Afghanistan,Asia
2,Afghanistan,Asia
3,Afghanistan,Asia
4,Afghanistan,Asia
...,...,...
1699,Zimbabwe,Africa
1700,Zimbabwe,Africa
1701,Zimbabwe,Africa
1702,Zimbabwe,Africa


In [90]:
colms.head()

Unnamed: 0,country,continent
0,Afghanistan,Asia
1,Afghanistan,Asia
2,Afghanistan,Asia
3,Afghanistan,Asia
4,Afghanistan,Asia


In [91]:
df[["country"]] # as we passed country column in list format it will give output as dataframe not series.

Unnamed: 0,country
0,Afghanistan
1,Afghanistan
2,Afghanistan
3,Afghanistan
4,Afghanistan
...,...
1699,Zimbabwe
1700,Zimbabwe
1701,Zimbabwe
1702,Zimbabwe


In [92]:
df[['country']].shape # Shape of DataFrame (country is given in list format which provides 2d array)

(1704, 1)

In [93]:
df['country'].shape #shape of Series as it is a 1-d array

(1704,)

In [94]:
df['country'].unique() # .Unique() will print unique countries names.

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Australia', 'Austria', 'Bahrain', 'Bangladesh', 'Belgium',
       'Benin', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'China',
       'Colombia', 'Comoros', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Czech Republic',
       'Denmark', 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Ethiopia',
       'Finland', 'France', 'Gabon', 'Gambia', 'Germany', 'Ghana',
       'Greece', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Haiti',
       'Honduras', 'Hong Kong, China', 'Hungary', 'Iceland', 'India',
       'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kenya', 'Korea, Dem. Rep.',
       'Korea, Rep.', 'Kuwait', 'Leba

In [95]:
df['country'].nunique() #.nunique() will print count of unique countries.

142

**`pd.value_counts()`** is a Pandas function used to count the occurrences of unique values in a Series object. Here's a short description of its functionality:

- **Function**: `pd.value_counts()`
- **Input**: A Pandas Series object.
- **Output**: Returns a new Series containing counts of unique values in the input Series, sorted in descending order by default.
- **Usage**:
  - Often used to examine the frequency distribution of categorical data.
  - It can be applied to a single column of a DataFrame to count occurrences of each unique value in that column.
- **Parameters**:
  - `normalize`: If set to True, returns the relative frequencies (proportions) of unique values instead of counts.
  - `sort`: If set to True, sorts the result by frequencies in descending order.
  - `ascending`: If set to True, sorts the result in ascending order.
  - `dropna`: If set to True, excludes missing values (NaN) from the result.
- **Example**:
  ```python
  import pandas as pd

  # Create a Series
  data = pd.Series(['A', 'B', 'A', 'C', 'A', 'B'])
  
  # Count the occurrences of unique values
  counts = data.value_counts()

  print(counts)
  ```
  Output:
  ```
  A    3
  B    2
  C    1
  dtype: int64
  ```
- **Note**: It returns a Series with the unique values as index labels and their respective counts as values.

In [96]:
#Most important function.


df['country'].value_counts(ascending = False)

Afghanistan          12
Pakistan             12
New Zealand          12
Nicaragua            12
Niger                12
                     ..
Eritrea              12
Equatorial Guinea    12
El Salvador          12
Egypt                12
Zimbabwe             12
Name: country, Length: 142, dtype: int64

In [97]:
print(df['continent'].value_counts())

print()

print(df['continent'].value_counts(ascending = True))

#always in decreasing order

Africa      624
Asia        396
Europe      360
Americas    300
Oceania      24
Name: continent, dtype: int64

Oceania      24
Americas    300
Europe      360
Asia        396
Africa      624
Name: continent, dtype: int64


In [98]:
df['life_exp'].max() # Gives max value of life_exp column

82.603

In [99]:
df['life_exp'].min() # Gives min value of life_exp column

23.599

In [100]:
df.head()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


In [101]:
df.rename({'population' : 'POPULATION'},axis = 1)

#df.rename({'population : 'POPULATION'}, axis = 1)

#This will not change the column name in the original data permanently. This will make a copy of original table with modified column name.

Unnamed: 0,country,year,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [102]:
df = df.rename({'population' : 'POPULATION'},axis = 1)
df

#This code will change the DataFrame permanently.

Unnamed: 0,country,year,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [103]:
#Method - 2 to change names of column
df = df.rename(columns = {'population' : 'POPULATION'}) # df.rename(colums = {'population' : 'POPULATION'})
df

Unnamed: 0,country,year,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [104]:
df.rename(columns = {'country' : 'COUNTRY','year' : 'YEAR','POPULATION' : 'POPULATION','continent' : 'CONTINENT','life_exp' : 'LIFE_EXP','gdp_cap' : 'GDP_CAP'}).head()

Unnamed: 0,COUNTRY,YEAR,POPULATION,CONTINENT,LIFE_EXP,GDP_CAP
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


In [105]:
df.head()

Unnamed: 0,country,year,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


In [106]:
df.rename(columns = {'country': 'COUNTRYY'})

Unnamed: 0,COUNTRYY,year,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [107]:
df.rename(columns = {'country': 'COUNTRYY'}, inplace = True) #This will make permanent changes in the DataFrame. #df.rename(columns = {'population' : 'POPULATION'}, inplace = True)
df
# By default inplace is false.

Unnamed: 0,COUNTRYY,year,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [108]:
df.rename(columns = {'year' : 'YEAR'}, inplace = True)
df

Unnamed: 0,COUNTRYY,YEAR,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [109]:
df

Unnamed: 0,COUNTRYY,YEAR,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


**`df.drop()`** is a Pandas method used to remove rows or columns from a DataFrame based on specified labels or indices. Here's a short description of its functionality:

- **Method**: `df.drop()`
- **Input**:
  - For removing rows: List or array-like object containing labels or indices of rows to be dropped.
  - For removing columns: List or array-like object containing labels or indices of columns to be dropped.
- **Output**: Returns a new DataFrame with the specified rows or columns removed.
- **Usage**:
  - Typically used to eliminate unwanted rows or columns from a DataFrame.
- **Parameters**:
  - `labels`: Labels or indices of rows or columns to be dropped.
  - `axis`: Specifies whether to drop rows (`axis=0`) or columns (`axis=1`). Default is 0 (rows).
  - `index`: Alternative to the `labels` parameter for specifying row labels to be dropped.
  - `columns`: Alternative to the `labels` parameter for specifying column labels to be dropped.
  - `inplace`: If set to True, modifies the DataFrame in place and returns None.
- **Example**:
  ```python
  import pandas as pd

  # Create a DataFrame
  df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

  # Drop column 'B' from the DataFrame
  df.drop(columns=['B'], inplace=True)

  print(df)
  ```
  Output:
  ```
     A
  0  1
  1  2
  2  3
  ```
- **Note**: By default, `df.drop()` returns a new DataFrame with the specified rows or columns removed, while the original DataFrame remains unchanged. If you want to modify the original DataFrame in place, you can use the `inplace=True` parameter.

In [110]:
df.drop(columns = ['YEAR'], inplace = True) # df.drop(columns = ['YEAR'], inplace = True)

In [111]:
df

Unnamed: 0,COUNTRYY,POPULATION,continent,life_exp,gdp_cap
0,Afghanistan,8425333,Asia,28.801,779.445314
1,Afghanistan,9240934,Asia,30.332,820.853030
2,Afghanistan,10267083,Asia,31.997,853.100710
3,Afghanistan,11537966,Asia,34.020,836.197138
4,Afghanistan,13079460,Asia,36.088,739.981106
...,...,...,...,...,...
1699,Zimbabwe,9216418,Africa,62.351,706.157306
1700,Zimbabwe,10704340,Africa,60.377,693.420786
1701,Zimbabwe,11404948,Africa,46.809,792.449960
1702,Zimbabwe,11926563,Africa,39.989,672.038623


In [112]:
df.drop(columns = ['COUNTRYY'], inplace = True) # inplace = True will delete the country column permanently and by default inplace = False
df

Unnamed: 0,POPULATION,continent,life_exp,gdp_cap
0,8425333,Asia,28.801,779.445314
1,9240934,Asia,30.332,820.853030
2,10267083,Asia,31.997,853.100710
3,11537966,Asia,34.020,836.197138
4,13079460,Asia,36.088,739.981106
...,...,...,...,...
1699,9216418,Africa,62.351,706.157306
1700,10704340,Africa,60.377,693.420786
1701,11404948,Africa,46.809,792.449960
1702,11926563,Africa,39.989,672.038623


In [113]:
df

Unnamed: 0,POPULATION,continent,life_exp,gdp_cap
0,8425333,Asia,28.801,779.445314
1,9240934,Asia,30.332,820.853030
2,10267083,Asia,31.997,853.100710
3,11537966,Asia,34.020,836.197138
4,13079460,Asia,36.088,739.981106
...,...,...,...,...
1699,9216418,Africa,62.351,706.157306
1700,10704340,Africa,60.377,693.420786
1701,11404948,Africa,46.809,792.449960
1702,11926563,Africa,39.989,672.038623


In [140]:
data = pd.read_csv('mckinsey.csv')
data

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [141]:
# Creating of new columns

data['YEAR+10'] = data['year'] + 10
data

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,YEAR+10
0,Afghanistan,1952,8425333,Asia,28.801,779.445314,1962
1,Afghanistan,1957,9240934,Asia,30.332,820.853030,1967
2,Afghanistan,1962,10267083,Asia,31.997,853.100710,1972
3,Afghanistan,1967,11537966,Asia,34.020,836.197138,1977
4,Afghanistan,1972,13079460,Asia,36.088,739.981106,1982
...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306,1997
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786,2002
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960,2007
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623,2012


In [142]:
data

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,YEAR+10
0,Afghanistan,1952,8425333,Asia,28.801,779.445314,1962
1,Afghanistan,1957,9240934,Asia,30.332,820.853030,1967
2,Afghanistan,1962,10267083,Asia,31.997,853.100710,1972
3,Afghanistan,1967,11537966,Asia,34.020,836.197138,1977
4,Afghanistan,1972,13079460,Asia,36.088,739.981106,1982
...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306,1997
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786,2002
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960,2007
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623,2012


In [143]:
data['gdp_cap'] * data['population']

0       6.567086e+09
1       7.585449e+09
2       8.758856e+09
3       9.648014e+09
4       9.678553e+09
            ...     
1699    6.508241e+09
1700    7.422612e+09
1701    9.037851e+09
1702    8.015111e+09
1703    5.782658e+09
Length: 1704, dtype: float64

In [144]:
data['GDP'] = (data['gdp_cap'] * data['population']/10**9)

In [145]:
data.head()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,YEAR+10,GDP
0,Afghanistan,1952,8425333,Asia,28.801,779.445314,1962,6.567086
1,Afghanistan,1957,9240934,Asia,30.332,820.85303,1967,7.585449
2,Afghanistan,1962,10267083,Asia,31.997,853.10071,1972,8.758856
3,Afghanistan,1967,11537966,Asia,34.02,836.197138,1977,9.648014
4,Afghanistan,1972,13079460,Asia,36.088,739.981106,1982,9.678553


In [149]:
data['own'] = 99
data.head()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,GDP,own
0,Afghanistan,1952,8425333,Asia,28.801,779.445314,6.567086,99
1,Afghanistan,1957,9240934,Asia,30.332,820.85303,7.585449,99
2,Afghanistan,1962,10267083,Asia,31.997,853.10071,8.758856,99
3,Afghanistan,1967,11537966,Asia,34.02,836.197138,9.648014,99
4,Afghanistan,1972,13079460,Asia,36.088,739.981106,9.678553,99


In [150]:
# dropping of columns that we newly created

data.drop(columns = ['own'],inplace = True)

In [151]:
data

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,GDP
0,Afghanistan,1952,8425333,Asia,28.801,779.445314,6.567086
1,Afghanistan,1957,9240934,Asia,30.332,820.853030,7.585449
2,Afghanistan,1962,10267083,Asia,31.997,853.100710,8.758856
3,Afghanistan,1967,11537966,Asia,34.020,836.197138,9.648014
4,Afghanistan,1972,13079460,Asia,36.088,739.981106,9.678553
...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306,6.508241
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786,7.422612
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960,9.037851
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623,8.015111


# Working with Rows

In [152]:
data

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,GDP
0,Afghanistan,1952,8425333,Asia,28.801,779.445314,6.567086
1,Afghanistan,1957,9240934,Asia,30.332,820.853030,7.585449
2,Afghanistan,1962,10267083,Asia,31.997,853.100710,8.758856
3,Afghanistan,1967,11537966,Asia,34.020,836.197138,9.648014
4,Afghanistan,1972,13079460,Asia,36.088,739.981106,9.678553
...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306,6.508241
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786,7.422612
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960,9.037851
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623,8.015111


In [124]:
data.columns

Index(['country', 'year', 'population', 'continent', 'life_exp', 'gdp_cap',
       'GDP'],
      dtype='object')

In [125]:
data.index

RangeIndex(start=0, stop=1704, step=1)

In [156]:
data.index = list(range(10, 1714)) #Changing the old indicies into new indicies(0-10,1-11,2-12,........) i.e, index values of each row is increased by 10

data

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,GDP
10,Afghanistan,1952,8425333,Asia,28.801,779.445314,6.567086
11,Afghanistan,1957,9240934,Asia,30.332,820.853030,7.585449
12,Afghanistan,1962,10267083,Asia,31.997,853.100710,8.758856
13,Afghanistan,1967,11537966,Asia,34.020,836.197138,9.648014
14,Afghanistan,1972,13079460,Asia,36.088,739.981106,9.678553
...,...,...,...,...,...,...,...
1709,Zimbabwe,1987,9216418,Africa,62.351,706.157306,6.508241
1710,Zimbabwe,1992,10704340,Africa,60.377,693.420786,7.422612
1711,Zimbabwe,1997,11404948,Africa,46.809,792.449960,9.037851
1712,Zimbabwe,2002,11926563,Africa,39.989,672.038623,8.015111


In [128]:
data.shape

(1704, 7)

In [129]:
sample = data.head()
sample

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,GDP
10,Afghanistan,1952,8425333,Asia,28.801,779.445314,6.567086
11,Afghanistan,1957,9240934,Asia,30.332,820.85303,7.585449
12,Afghanistan,1962,10267083,Asia,31.997,853.10071,8.758856
13,Afghanistan,1967,11537966,Asia,34.02,836.197138,9.648014
14,Afghanistan,1972,13079460,Asia,36.088,739.981106,9.678553


The code **`df.index** = ['a','w','c']` assigns a new index to the DataFrame `df`. Here's an explanation of what it does and its functionality:

- **Code**: `df.index = ['a','w','c']`
- **Functionality**:
  - The DataFrame `df` has an index, which is a set of labels that identify each row in the DataFrame.
  - By default, the index of a DataFrame is numeric and starts from 0, 1, 2, ..., N-1, where N is the number of rows.
  - However, you can customize the index to use labels of your choice instead of the default numeric index.
  - In the given code, `['a','w','c']` is a list of labels. By assigning this list to `df.index`, you are specifying that the DataFrame `df` should use these labels as its index.
  - After executing this code, the DataFrame `df` will have a new index composed of the labels 'a', 'w', and 'c', instead of the default numeric index.

- **Function**:
  - The function of this code is to change the index of the DataFrame `df` to the specified list of labels.
  - This can be useful for various purposes, such as setting more descriptive or meaningful row labels, aligning the DataFrame with external data sources, or simplifying data retrieval and manipulation.
  - It provides flexibility in how you organize and access your data within the DataFrame.

- **Example**:
  ```python
  import pandas as pd

  # Create a DataFrame
  df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

  # Assign new index
  df.index = ['a', 'w', 'c']

  print(df)
  ```
  Output:
  ```
     A  B
  a  1  4
  w  2  5
  c  3  6
  ```
- **Note**:
  - When assigning a new index, ensure that the length of the index list matches the number of rows in the DataFrame, otherwise you'll get a ValueError.
  - Customizing the index can make your DataFrame more intuitive and facilitate data analysis and manipulation tasks.

In [130]:
sample.index = ['a','b','c','d','e']
sample

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap,GDP
a,Afghanistan,1952,8425333,Asia,28.801,779.445314,6.567086
b,Afghanistan,1957,9240934,Asia,30.332,820.85303,7.585449
c,Afghanistan,1962,10267083,Asia,31.997,853.10071,8.758856
d,Afghanistan,1967,11537966,Asia,34.02,836.197138,9.648014
e,Afghanistan,1972,13079460,Asia,36.088,739.981106,9.678553


In [131]:
url = "https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/002/276/original/tips.csv?1645193273"

df = pd.read_csv(url)
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In Pandas, `loc` and `iloc` are two indexing methods used to access data in a DataFrame.

1. **`loc`**:

    - `loc` is primarily label-based indexing, meaning you can use it to access data based on row and column labels.
    - It takes two arguments: the row label(s) and the column label(s), separated by a comma.
    - It can accept single labels, lists of labels, or slice objects.
    - If you provide a single label for rows or columns, `loc` returns a Series. If you provide multiple labels or a slice, it returns a DataFrame.
    - `loc` is inclusive of both the start and stop labels when using slice notation.
    - Example usage:
        ```python
        # Access a single element
        df.loc['row_label', 'column_label']
        
        # Access an entire row
        df.loc['row_label']
        
        # Access a subset of rows and columns
        df.loc[['row_label1', 'row_label2'], ['column_label1', 'column_label2']]
        
        # Access rows and all columns using slice notation
        df.loc['row_label1':'row_label2', :]
        ```

2. **`iloc`**:

    - `iloc` is purely integer-based indexing, allowing you to access data using integer positions of rows and columns.
    - It works similar to Python's native indexing, where the index starts from 0.
    - Like `loc`, it takes two arguments: integer positions for rows and columns, separated by a comma.
    - It can accept single integers, lists of integers, or slice objects.
    - `iloc` is exclusive of the stop index when using slice notation.
    - Example usage:
        ```python
        # Access a single element
        df.iloc[row_index, column_index]
        
        # Access an entire row
        df.iloc[row_index]
        
        # Access a subset of rows and columns
        df.iloc[[row_index1, row_index2], [column_index1, column_index2]]
        
        # Access rows and all columns using slice notation
        df.iloc[row_index1:row_index2, :]
        ```

In summary, `loc` is used for label-based indexing, where you specify row and column labels, while `iloc` is used for integer-based indexing, where you specify integer positions of rows and columns. Understanding the difference between these two methods is crucial for effectively accessing and manipulating data in Pandas DataFrames.

In [132]:
df.iloc[0]

total_bill     16.99
tip             1.01
sex           Female
smoker            No
day              Sun
time          Dinner
size               2
Name: 0, dtype: object

In [133]:
df.iloc[-1]

total_bill     18.78
tip              3.0
sex           Female
smoker            No
day             Thur
time          Dinner
size               2
Name: 243, dtype: object

In [134]:
df.iloc[:2,:2] # .iloc[rows_index,columns_index] This method uses indexes for slicing.

Unnamed: 0,total_bill,tip
0,16.99,1.01
1,10.34,1.66


In [157]:
df.loc[:2,"total_bill":"day"] # .loc[] This method uses explicit names of columns for slicing

Unnamed: 0,total_bill,tip,sex,smoker,day
0,16.99,1.01,Female,No,Sun
1,10.34,1.66,Male,No,Sun
2,21.01,3.5,Male,No,Sun


In [136]:
df[['time', 'total_bill', 'tip']]

Unnamed: 0,time,total_bill,tip
0,Dinner,16.99,1.01
1,Dinner,10.34,1.66
2,Dinner,21.01,3.50
3,Dinner,23.68,3.31
4,Dinner,24.59,3.61
...,...,...,...
239,Dinner,29.03,5.92
240,Dinner,27.18,2.00
241,Dinner,22.67,2.00
242,Dinner,17.82,1.75


In [137]:
df.loc[:, ['time', 'total_bill', 'tip']]

Unnamed: 0,time,total_bill,tip
0,Dinner,16.99,1.01
1,Dinner,10.34,1.66
2,Dinner,21.01,3.50
3,Dinner,23.68,3.31
4,Dinner,24.59,3.61
...,...,...,...
239,Dinner,29.03,5.92
240,Dinner,27.18,2.00
241,Dinner,22.67,2.00
242,Dinner,17.82,1.75


The given code `pd.DataFrame(df, columns=['time', 'total_bill', 'tip'])` creates a new DataFrame from an existing DataFrame `df` while specifying the columns to include in the new DataFrame. Here's an explanation of what it does and its functionality:

- **Code**: `pd.DataFrame(df, columns=['time', 'total_bill', 'tip'])`
- **Functionality**:
  - The `pd.DataFrame()` function is used to create a new DataFrame object.
  - The first argument (`df`) specifies the data that will be used to populate the new DataFrame. In this case, it refers to the existing DataFrame `df`.
  - The `columns` parameter specifies the column labels for the new DataFrame.
  - By specifying the `columns` parameter, you are selecting a subset of columns from the existing DataFrame `df` to include in the new DataFrame.
  - The columns specified ('time', 'total_bill', 'tip') should exist in the original DataFrame `df`, otherwise, the new DataFrame will contain NaN (missing) values for those columns.

- **Example**:
  ```python
  import pandas as pd

  # Create an existing DataFrame
  df = pd.DataFrame({
      'time': ['Lunch', 'Dinner', 'Lunch', 'Dinner', 'Dinner'],
      'total_bill': [14.89, 17.23, 22.44, 19.99, 32.00],
      'tip': [2.0, 2.5, 3.5, 4.0, 4.5]
  })

  # Create a new DataFrame with selected columns
  new_df = pd.DataFrame(df, columns=['time', 'total_bill', 'tip'])

  print(new_df)
  ```
  Output:
  ```
       time  total_bill  tip
  0    Lunch       14.89  2.0
  1   Dinner       17.23  2.5
  2    Lunch       22.44  3.5
  3   Dinner       19.99  4.0
  4   Dinner       32.00  4.5
  ```
- **Note**:
  - If the columns specified in the `columns` parameter do not exist in the original DataFrame `df`, they will be added as new columns with NaN values in the new DataFrame.
  - This operation creates a new DataFrame without modifying the original DataFrame `df`. If you want to modify `df` in place, you can assign the result back to `df`. For example: `df = pd.DataFrame(df, columns=['time', 'total_bill', 'tip'])`.

In [158]:
pd.DataFrame(df, columns=['time', 'total_bill', 'tip'])

Unnamed: 0,time,total_bill,tip
0,Dinner,16.99,1.01
1,Dinner,10.34,1.66
2,Dinner,21.01,3.50
3,Dinner,23.68,3.31
4,Dinner,24.59,3.61
...,...,...,...
239,Dinner,29.03,5.92
240,Dinner,27.18,2.00
241,Dinner,22.67,2.00
242,Dinner,17.82,1.75


In [159]:
df.iloc[:,0:2]

Unnamed: 0,total_bill,tip
0,16.99,1.01
1,10.34,1.66
2,21.01,3.50
3,23.68,3.31
4,24.59,3.61
...,...,...
239,29.03,5.92
240,27.18,2.00
241,22.67,2.00
242,17.82,1.75
