Importing data into the Python environment is the starting point to detect the insights that matter. after importing the data you can clean, and visualize it.
You'll learn the different types of files that you import and different ways to import data into Python.

First, you'll see the flat file data types.

## CSV Files
One of the most common flat file types is the CSV format (comma-separated values).

I'll explain how to import csv file using pandas. The first line of code below imports the pandas package using the alias pd. The second line reads the .csv file and stores it as a pandas dataframe using the pandas pd.read_csv() function. The third line prints the shape of the data, and the fourth line displays the first five rows.

In [None]:
import pandas as pd
df = pd.read_csv('restaurants.csv')
print(df.shape)
df.head(5)

(336, 6)


Unnamed: 0.1,Unnamed: 0,name,addr,city,phone,type
0,0,arnie morton's of chicago,435 s. la cienega blv .,los angeles,3102461501,american
1,1,art's delicatessen,12224 ventura blvd.,studio city,8187621221,american
2,2,campanile,624 s. la brea ave.,los angeles,2139381447,american
3,3,fenix,8358 sunset blvd. west,hollywood,2138486677,american
4,4,grill on the alley,9560 dayton way,los angeles,3102760615,american


## Text Files
The other common flat file type is text files, which also contain textual data, but not necessarily in a tabular format. 

The first line of code below reads the text file using also the pandas pd.read_csv() function, we also here use the sep parameter to specify the separator between columns(fields) in each row(record), then save the result into dataframe. The second line prints the first few lines of the file.

In [None]:
df_textfile = pd.read_csv('names.txt', sep='\t')
print(df_textfile)

    NAME  AGE  COUNTRY
0  Ahmed   44    Egypt
1   Ali    45      USA
2   Mona   23     Oman
3   Dina   35   France
4   John   21  England
5   Mark   39  Cameron


Now, you'll see other file types.

## Excel Data
Excel data needs no introduction and is the most widely used data type in the business world. 

The first line of code below imports and stores the dataset using the pandas pd.ExcelFile() function. The second line prints the sheet names in the file.

In [None]:
excel_data = pd.ExcelFile('restaurants.xlsx')
print(excel_data.sheet_names)

['restaurants_1', 'restaurants_2', 'restaurants_3']


The output shows that the Excel file has three sheets. If we didn't specify a sheet name, it would take the first sheet by default.  Using the parse function as we see below to store the data as a pandas data frame.

In [None]:
df1 = excel_data.parse()

print(df1.head(2))

   Unnamed: 0                       name                       addr  \
0           0  arnie morton's of chicago   435 s. la cienega blv .    
1           1         art's delicatessen       12224 ventura blvd.    

          city       phone      type  
0  los angeles  3102461501  american  
1  studio city  8187621221  american  


If we want to load only a particular sheet from the Excel file for analysis, we can do that using the first line of code below. The second line prints the first five rows of the data. It is also possible to customize the imports, for example, skipping certain rows, importing only selected columns, or changing variable names.

In [None]:
df_excel = excel_data.parse('restaurants_2')
df_excel.head()

Unnamed: 0.1,Unnamed: 0,name,addr,city,phone,type
0,0,arnie morton's of chicago,435 s. la cienega blv .,los angeles,3102461501,american
1,1,art's delicatessen,12224 ventura blvd.,studio city,8187621221,american
2,2,campanile,624 s. la brea ave.,los angeles,2139381447,american
3,3,fenix,8358 sunset blvd. west,hollywood,2138486677,american
4,4,grill on the alley,9560 dayton way,los angeles,3102760615,american


## Pickled files
It's native to Python, used with many datatypes for which it isn't obvious how to stroe them, and it's serialized (converted to bytestream)

you should import pickle package first, then use it to load the file.

In [None]:
import pickle
with open('taxi_vehicles.p', 'rb') as file:
    data = pickle.load(file)
print(data)

       vid     make    model  year  fuel_type                owner
0     2767   TOYOTA    CAMRY  2013     HYBRID       SEYED M. BADRI
1     1411   TOYOTA     RAV4  2017     HYBRID          DESZY CORP.
2     6500   NISSAN   SENTRA  2019   GASOLINE       AGAPH CAB CORP
3     2746   TOYOTA    CAMRY  2013     HYBRID  MIDWEST CAB CO, INC
4     5922   TOYOTA    CAMRY  2013     HYBRID       SUMETTI CAB CO
...    ...      ...      ...   ...        ...                  ...
3514  5902   TOYOTA    CAMRY  2013     HYBRID            SAFAR INC
3515  1407  HYUNDAI  ELANTRA  2018   GASOLINE    MYKONOS CAB CORP.
3516   854   TOYOTA    CAMRY  2012     HYBRID      JOELIZ CORP INC
3517  6274   TOYOTA    CAMRY  2012     HYBRID          A K O S INC
3518  4675     FORD   ESCAPE  2011  FLEX FUEL           MAJAZ CORP

[3519 rows x 6 columns]


## SAS and Stata files
They are data files used with business analytics and biostatistics and academics social science research.

We import SAS7BDAT from sas7bdat package, then use it to load data into dataframe

In [None]:
from sas7bdat import SAS7BDAT
with SAS7BDAT('sales.sas7bdat') as file:
    df_sas = file.to_data_frame()
df_sas.head(2)

Unnamed: 0,YEAR,P,S
0,1950.0,12.9,181.899994
1,1951.0,11.9,245.0


For Stata we don't need to import any thinng, just pandas which has a method for Stata (pd.read_stata)

In [None]:
df_stata = pd.read_stata('disarea.dta')
df_stata.head(2)

Unnamed: 0,wbcode,country,disa1,disa2,disa3,disa4,disa5,disa6,disa7,disa8,...,disa16,disa17,disa18,disa19,disa20,disa21,disa22,disa23,disa24,disa25
0,AFG,Afghanistan,0.0,0.0,0.76,0.73,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
1,AGO,Angola,0.32,0.02,0.56,0.0,0.0,0.0,0.56,0.0,...,0.0,0.4,0.0,0.61,0.0,0.0,0.99,0.98,0.61,0.0


## HDF5
It's standard for storing large quantities of numerical data (hundreds of gigabytes or terabytes)

In [None]:
import h5py
data = h5py.File('test.hdf5', 'r')
print(type(data))

<class 'h5py._hl.files.File'>


## MATLAB
It's standard type in engineering and science.

To import a .mat file we import scipy.io, it fives us data in a dictionary datatype

In [None]:
import scipy.io
filename = 'ja_data2.mat'
mat = scipy.io.loadmat(filename)
print(type(mat))

<class 'dict'>


## SQL Database
Relational databases are a prominent source of data storage for many organizations, and it is extremely important to know how to import data from such databases. Structured Query Language (or SQL) is the most widely used database, and we can import data from tables stored in SQL Server by building a connection.

The first step is to import the required packages and functions. The sqlalchemy package is used in the illustration below. 
Then you need to query the database using these steps:
- create database engine
- connect to engine
- query the database
- save the query result to a dataframe
- close the connection

In [None]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///Chinook.sqlite')
table_names = engine.table_names()
print(table_names)

['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


In [None]:

con = engine.connect()
rs = con.execute("SELECT * FROM Album")
df = pd.DataFrame(rs.fetchall())
df.columns = rs.keys()
con.close()

df.head()

Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3


## Importing Data from URL
Often data is available on a website and can be downloaded into a local system. We can load the data directly from a website URL (Universal Resource Locator) into the Python environment.


We will use the urllib library for performing this task, as this package provides the interface for fetching data across the web. The first two lines of code below import the required libraries. The third line assigns the URL of the file that we want to import into Python. The fourth line uses the urlretrieve function to save the file in the local environment. The last three lines of code below read the file into a dataframe and print the shape and the first few observations of the dataset.

In [None]:
import urllib
from urllib.request import urlretrieve

url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
urlretrieve(url, 'ww.csv')

df_url = pd.read_csv('ww.csv')
print(df_url.shape)
df.head(5)

(4898, 1)


Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3
