**<center><h1>Pandas DataFrame</h1></center>**

DataFrame is two-dimensional array which is made up of column and row indexes. Columns are made up of series objects.

![alt text](https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRfXpUlAkzyapM437-_ZN_E2xn3fhllCXZi0A&usqp=CAU)

A DataFrame has two Indexes:
- The column indexor column names:  Column names can be a list of strings, or integers.
- The row index : Row index can be integers, strings, DatetimeIndex or PeriodIndex(for time series).
   
**Syntax**

```
pandas.DataFrame(data, index, columns)

data : data takes various forms like ndarray, series, map,
        lists, dict, constants and also another DataFrame.

index : Default np.arange(n) if no index is passed.

columns : Default syntax is np.arange(n) if no column names is passed.
```

A pandas DataFrame can be created using various inputs like −

- dictionary
- List
- Series
- Numpy ndarrays
- Another DataFrame

**<h2>Dictionary</h2>**

1. **<u>Create a DataFrame from Dict of ndarrays / Lists</u>**  :

Each array or list should have same length. If index is provided, its length must match the length of the arrays. If no index is specified, the default range for the index is 0 to n-1.


In [None]:
import pandas as pd

dict = {"College": ["NCET", "RV", "RMS College"],
       "Pass Percentage": [90.0, 90.0, 90.0],
       "Place": ["Bangalore", "Bangalore", "Bangalore"]}


brics = pd.DataFrame(dict)
print(brics)

       College  Pass Percentage      Place
0         NCET             90.0  Bangalore
1           RV             90.0  Bangalore
2  RMS College             90.0  Bangalore


In [None]:
# Let us now create an indexed DataFrame using arrays.
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],
        'Age':[28,34,29,42]}

df = pd.DataFrame(data,
                  index=['rank1','rank2','rank3','rank4'])
print(df)

        Name  Age
rank1    Tom   28
rank2   Jack   34
rank3  Steve   29
rank4  Ricky   42


In [None]:
data = {"Name": "Santhosh",
        'Age':[18,19],
        'Degree':['MBA','MSC','CA']}
df = pd.DataFrame(data, columns = ['Name', 'Age','Degree'],
                  index = ['a','b','c'])
df

ValueError: ignored

2. <u>**Create a DataFrame from List of Dicts**</u>

Dataframe can be created by providing list of dictionaries as an input.  Dictionary keys are taken as column names. Nan is appended in case of missing.

In [None]:
# Observe, NaN (Not a Number) is appended in missing areas.
data = [{'a': 1, 'b': 2},
        {'a': 5, 'b': 10, 'c': 20}]

df = pd.DataFrame(data)
print(df)

   a   b     c
0  1   2   NaN
1  5  10  20.0


In [None]:
data = [{"Name": "Santhosh",
        'Age':[18,19],
        'Degree':['MBA','MSC']},
        {'Degree': ['CA']}]
df = pd.DataFrame(data, columns = ['Name','Age', 'Degree'])
df

Unnamed: 0,Name,Age,Degree
0,Santhosh,"[18, 19]","[MBA, MSC]"
1,,,[CA]


In [None]:
# we can also pass index
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

df = pd.DataFrame(data,
                  index=['first', 'second'])
print(df)

        a   b     c
first   1   2   NaN
second  5  10  20.0


3. **Create a DataFrame from Dict of Series**

Dictionary of Series can be passed to form a DataFrame. The resultant index is the union of all the series indexes passed.

In [None]:
import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
   'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print("\n",df)


    one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  NaN    4


In [None]:
data = {"Name": pd.Series("Santhosh", index = ['a','b']),
        'Age':pd.Series([18,19], index = ['a','b']),
        'Degree':pd.Series(['MBA','MSC','CA'], index = ['a','b', 'c'])}
df = pd.DataFrame(data, columns = ['Name', 'Age','Degree'])
df

Unnamed: 0,Name,Age,Degree
a,Santhosh,18.0,MBA
b,Santhosh,19.0,MSC
c,,,CA


**<h2>List</h2>**

1. <u>**The DataFrame can be created using a single list or a list of lists.**</u>
  
Dataframe can be created by passing single list or list of list as an input. Here We should specify the name of columns. If column name is not given then default will be range(length of inner list) if input is list of list.

In [None]:
# Data Frame using single List
data = [1,2,3,4,5]

df = pd.DataFrame(data)

print("using single list\n\n",df)


using single list

    0
0  1
1  2
2  3
3  4
4  5


In [None]:
# Data Frame using list of list
data = [['Alex',10],
        ['Bob',12],
        ['Clarke',13]]

df = pd.DataFrame(data,
                  columns=['Name','Age'])

print("\n using list of list\n",df)



 using list of list
      Name  Age
0    Alex   10
1     Bob   12
2  Clarke   13


In [None]:
# Data Frame using list of list
data = [['Alex',10],
        ['Bob',12],
        ['Clarke',13, 14]]

df = pd.DataFrame(data)

print("\n using list of list\n",df)



 using list of list
         0   1     2
0    Alex  10   NaN
1     Bob  12   NaN
2  Clarke  13  14.0


**<h2>Using CSV or TSV or Excel file</h2>**

If we have CSV Or TSV or Excel file, then we can import using pandas functions.
```
csv_file = pd.read_csv('file_name.csv') # for csv file

tsv_file = pd.read_csv("file_name.tsv', sep = "\t") # for tsv file

excel_name = pd.read_excel('excel_file.xls') # for excel file
```
