# DataFrame 
*A DataFrame is like a table where the data is organized in rows and columns. It is a two-dimensional data structure like a two-dimensional array.*

We can create a Pandas DataFrame in the following ways:

1. Using Python Dictionary
2. Using Python List
3. From a File
4. Creating an Empty DataFrame


In [22]:
import pandas as pd

In [23]:
# we can create a DataFrame from dictionary where keys are column names and values are lists of column data
# here first we have created a dictionary 
data = {'Name': ['John', 'Alice', 'Bob'],
       'Age': [25, 30, 35],
       'City': ['New York', 'London', 'Paris']}
# here we are creating a DataFrame from the dictionary
# check the letters of the word DataFramw where D is capital and F is capital
df = pd.DataFrame(data)
print(df)
# here by default the index of the DataFrame is 0,1,2,..

    Name  Age      City
0   John   25  New York
1  Alice   30    London
2    Bob   35     Paris


In [24]:
# here we can set the index for the DataFrame to a column using set_index() method
df = df.set_index('Name')
print(df)
# we can follow the same process for all DartaFrame while setting the index to a column

       Age      City
Name                
John    25  New York
Alice   30    London
Bob     35     Paris


In [25]:
# here we can set a custom RangeIndex for the DataFrame using pd.RangeIndex(start, stop, name = 'here enter name of index you want to set')
df = pd.DataFrame(data, index=pd.RangeIndex(5, 8, name='Index'))
print(df)

        Name  Age      City
Index                      
5       John   25  New York
6      Alice   30    London
7        Bob   35     Paris


# Creating DataFrame using List

In [26]:
data = [['Ramesh',25,'New York'],
        ['Suresh',30,'London'],
        ['Mahesh',35,'Paris']]
# here we are creating a DataFrame from lists
df = pd.DataFrame(data)
print(df)

        0   1         2
0  Ramesh  25  New York
1  Suresh  30    London
2  Mahesh  35     Paris


In [27]:
# we can set the name for columns and rows(index) otherwise by default it gives 0,1,2.. as we have in above example
# for that pass a parameter columns and index to the dataframe
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'],index = ['person1','Person2','Person3'])
print(df)

           Name  Age      City
person1  Ramesh   25  New York
Person2  Suresh   30    London
Person3  Mahesh   35     Paris


# Creating a empty DataFrame

In [28]:
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


# Creating a DataFrame from a File

In [29]:
# here we imported a csv file into a DataFrame using read_csv() method
df = pd.read_csv('student_performance.csv')
print(df)

    Student_ID  Age  Gender  Class  Attendance_Percentage  Math_Score  \
0        S0001   15    Male     12                   65.0         NaN   
1        S0002   19  Female      9                   58.0        80.0   
2        S0003   14  Female     12                    NaN        83.0   
3        S0004   18  Female      9                   68.0        68.0   
4        S0005   14    Male     10                   80.0        41.0   
..         ...  ...     ...    ...                    ...         ...   
143      S0144   14    Male     11                   94.0         NaN   
144      S0145   19    Male     12                   56.0        83.0   
145      S0146   19  Female     12                    NaN        51.0   
146      S0147   19    Male     11                   54.0        96.0   
147      S0148   19  Female     11                   91.0        65.0   

     Science_Score  English_Score  Final_Percentage  
0              NaN           72.0             50.33  
1             4

In [30]:
# while reading a file we can also use path of the file to read it
df = pd.read_csv('/Users/ypragnesh/Desktop/Pandas_Notes/student_performance.csv')
print(df)

    Student_ID  Age  Gender  Class  Attendance_Percentage  Math_Score  \
0        S0001   15    Male     12                   65.0         NaN   
1        S0002   19  Female      9                   58.0        80.0   
2        S0003   14  Female     12                    NaN        83.0   
3        S0004   18  Female      9                   68.0        68.0   
4        S0005   14    Male     10                   80.0        41.0   
..         ...  ...     ...    ...                    ...         ...   
143      S0144   14    Male     11                   94.0         NaN   
144      S0145   19    Male     12                   56.0        83.0   
145      S0146   19  Female     12                    NaN        51.0   
146      S0147   19    Male     11                   54.0        96.0   
147      S0148   19  Female     11                   91.0        65.0   

     Science_Score  English_Score  Final_Percentage  
0              NaN           72.0             50.33  
1             4

# TASK
HERE IS ANOTHER DATA FILE (laptop_prices.csv) DO THE SAME EXAMPLES BELOW WHICH I HAVE DONE ABOVE FOR BETTER PRACTICE

In [31]:
import pandas as pd
df = pd.read_csv('laptop_prices.csv')
print(df.head())

   laptop_ID Company      Product   TypeName  Inches  \
0          1   Apple  MacBook Pro  Ultrabook    13.3   
1          2   Apple  Macbook Air  Ultrabook    13.3   
2          3      HP       250 G6   Notebook    15.6   
3          4   Apple  MacBook Pro  Ultrabook    15.4   
4          5   Apple  MacBook Pro  Ultrabook    13.3   

                     ScreenResolution                         Cpu   Ram  \
0  IPS Panel Retina Display 2560x1600        Intel Core i5 2.3GHz   8GB   
1                            1440x900        Intel Core i5 1.8GHz   8GB   
2                   Full HD 1920x1080  Intel Core i5 7200U 2.5GHz   8GB   
3  IPS Panel Retina Display 2880x1800        Intel Core i7 2.7GHz  16GB   
4  IPS Panel Retina Display 2560x1600        Intel Core i5 3.1GHz   8GB   

                Memory                           Gpu  OpSys  Weight  \
0            128GB SSD  Intel Iris Plus Graphics 640  macOS  1.37kg   
1  128GB Flash Storage        Intel HD Graphics 6000  macOS  1.34kg   