# Pandas tutorial : Day 1 
Here's what we are going to do today:

* [What is pandas?](#1)
* [Get our enviornment setup](#2)
* [Pandas Data stucture](#3)
* [Import data](#4)
* [Exporting data](#5)
* [Creating test Dataframe](#6)

## What is pandas?<a id='1'></a>
**pandas** is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

**pandas** is a NumFOCUS sponsored project. This will help ensure the success of development of pandas as a world-class open-source project, and makes it possible to donate to the project.

## Get our environment setup<a id='2'></a>

In [1]:
# importing useful libraries
import pandas as pd # data processing
import numpy as np
import os

## Pandas Data Structure<a id='3'></a>
Pandas has two types of data-structures.
1. Series
1. DataFrame

### Series 
1D labeled array. It can accommodate any type of data in it.

In [2]:
mySeries = pd.Series([3, -5, 7, 4], index = ['a', 'b', 'c', 'd'])
print(mySeries)
print(type(mySeries))

a    3
b   -5
c    7
d    4
dtype: int64
<class 'pandas.core.series.Series'>


### DataFrame
2D data structure. It contains rows and columns

In [3]:
data = {'Country' : ['Belgium', 'India', 'Brazil'],
       'Capital' : ['Brussels', 'New Delhi', 'Brassilia'],
       'Population' : [12345,  123456, 98745]}

df = pd.DataFrame(data, columns = ['Country', 'Capital', 'Population'])
print(df)
print(type(data))
print(type(df))

   Country    Capital  Population
0  Belgium   Brussels       12345
1    India  New Delhi      123456
2   Brazil  Brassilia       98745
<class 'dict'>
<class 'pandas.core.frame.DataFrame'>


## Import Data<a id='4'></a>
Data scientists are expected to build high-performing machine learning models, but the starting point is getting the data into the Python environment. Only after importing the data can the data scientist clean, wrangle, visualize, and build predictive models on it.

In this guide, you'll learn the techniques to import data into Python. 

### Import CSV files
It is important to note that a singlebackslash does not work when specifying the file path. You need to either change it to forward slash or add one more backslash like below
* import pandas as pd
* mydata= pd.read_csv("C:\\Users\\Deepanshu\\Documents\\file1.csv")

### Import File from URL
You don't need to perform additional steps to fetch data from URL. Simply put URL in read_csv() function (applicable only for CSV files stored in URL).
* mydata = pd.read_csv("http://winterolympicsmedals.com/medals.csv")


### Read Text File
We can use read_table() function to pull data from text file. We can also use read_csv() with sep= "\t" to read data from tab-separated file.
* mydata = pd.read_table("C:\\Users\\jasprit\\Desktop\\example2.txt")
* mydata = pd.read_csv("C:\\Users\\jasprit\\Desktop\\example2.txt", sep ="\t")

### Read Excel File
The read_excel() function can be used to import excel data into Python.
* mydata = pd.read_excel(" https://www.eia.gov/dnav/pet/hist_xls/RBRTEd.xls ", sheetname="Data 1", skiprows=2)

If you do not specify name of sheet in sheetname= option, it would take by default first sheet.

## Exporting Data<a id='5'></a>
This is used to save the output/dataframe in the format you want.
* df.to_csv(filename) -> Writes to a CSV file
* df.to_excel(filename) -> Writes on an Excel file
* df.to_sql(table_name, connection_object) -> Writes to a SQL table
* df.to_json(filename) -> Writes to a file in JSON format
* df.to_html(filename) -> Saves as an HTML table
* df.to_clipboard() -> Writes to the clipboard

## Creating test Dataframe<a id='6'></a>

In [4]:
# Let's make a dataframe of 5 columns and 20 rows
pd.DataFrame(np.random.rand(20, 5))

Unnamed: 0,0,1,2,3,4
0,0.111026,0.079486,0.025729,0.572264,0.133045
1,0.072044,0.698252,0.395473,0.348387,0.577644
2,0.178542,0.908348,0.717975,0.720574,0.232078
3,0.334072,0.436854,0.353583,0.695982,0.744723
4,0.949541,0.101905,0.695278,0.140938,0.061982
5,0.730161,0.13995,0.577167,0.07838,0.816456
6,0.43844,0.82685,0.925094,0.61246,0.491583
7,0.552227,0.643066,0.300352,0.409719,0.485908
8,0.139359,0.220833,0.062681,0.518763,0.527097
9,0.234335,0.756339,0.714565,0.108209,0.030406


Hurey!!! We are done with some basics of pandas, now we will move to summaring data.