# Read CSV in pandas
<b>read_csv() function in Pandas is used to read data from CSV files into a Pandas DataFrame.</b>

In [None]:
# simple code of read_csv()
import pandas as pd
df = pd.read_csv(r"C:\Users\ASHRAF\Downloads\people_data (1).csv")
print(df)

# read_csv :
<b>Syntax and parameter</b><br>
<b>Syntax:</b><br> 
pd.read_csv(filepath_or_buffer, sep=' ,' , header='infer',  index_col=None, usecols=None, engine=None,skiprows=None,nrows=None)<br> 

<b>Parameters:</b> <br>

<b>filepath_or_buffer:</b> Location of the csv file. It accepts any string path or URL of the file.<br>
<b>sep:</b> It stands for separator, default is ', '.<br>
<b>header:</b> It accepts int, a list of int, row numbers to use as the column names, and the start of the data.<br>
If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1, and so on.<br>
<b>usecols:</b> Retrieves only selected columns from the CSV file.<br>
<b>nrows:</b> Number of rows to be displayed from the dataset.<br>
<b>index_col:</b> If None, there are no index numbers displayed along with records.<br>  
<b>skiprows:</b> Skips passed rows in the new data frame.

<h1>Features in read_csv()</h1>

<b>1. Read specific column using usecols</b>

In [None]:
# only read last_name and their email on csv
import pandas as pd
read = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv",
                   usecols=["Last Name","Email"])
print(read)


<b>2. Setting an index column(index_col)</b><br>
can set one or more columns as the dataframe index.

In [None]:
# setting first name as a new index
set_in = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv",
                    index_col=["First Name"])
print(set_in)

<b> if want to handle missing value.</b><br>
# 3. Can use na_values
as in this csv file there are not have any missing value so it not need now

<b>4. nrows in read_csv()</b>

In [None]:
import pandas as pd 
# creat  url and then call the csv 
url = "https://media.geeksforgeeks.org/wp-content/uploads/20241121154629307916/people_data.csv"
# display only 5 rows
call_row = pd.read_csv(url,nrows=3)
print(call_row)

<b>5. skiprows</b><br>
it skip rows from the start of a file

In [None]:
data = pd.read_csv(url)
print("previous csv file\n",data)
# skip first 1 and 2 row 
new_data = pd.read_csv(url,skiprows=[1,2])
print("new csv mpdified file\n",new_data)

<b>6. parse_dates in read_csv</b><br>
The parse_dates parameter converts date columns into datetime objects,<br>
simplifying operations like filtering, sorting, or time-based analysis.

In [None]:
p_d = pd.read_csv(url,parse_dates=["Date of birth"])
print(p_d.info())

# Saving pandas DataFrame as CSV

In [None]:
# making simple DataFrame
import pandas as pd
df = pd.DataFrame({"Name":["Alice","Bob","Stephen","Jhon"],
                  "Degree":["MBA","BSc","BBA","Phd"],
                  "Score":[90,80,85,70]})
print(df)

<b>1. Export CSV to a Working Directory</b>

In [None]:
# creat a csv file in this direcrtory
df.to_csv("file1.csv")
# check the file
t1 = pd.read_csv("file1.csv")
print(t1)

<b>2. Saving CSV Without Headers and Index</b>

In [None]:
# creat a csv file without headers and index 
df.to_csv("file2.csv",header=False,index=False)
# show this
print(pd.read_csv("file2.csv"))

<b>3.Save the CSV file to a Specified Location</b>

In [None]:
# save this file in my pandas file
df.to_csv(r"C:\Users\ASHRAF\OneDrive\Desktop\all_of_pandas\file3.csv",index=False)

In [None]:
# to confirm this 
import os
file_path =r"C:\Users\ASHRAF\OneDrive\Desktop\all_of_pandas\file3.csv"
if os.path.exists(file_path):
    print("Found this file")
else:
    print("Not found")

<b>4. Write a DataFrame to CSV file using Tab Separator</b>

In [None]:
print("original datafram(df):\n",df)
#modifie df with"\t"using sep
df.to_csv("file4.csv",sep="\t",header=True,index=False)
#print file4
print("modified dataframe in csv:\n",pd.read_csv("file4.csv"))


# Export pandas dataframe to CSV file
<b>CSV = Comma-Separated Values</b><br>
in pandas we use .to_csv to export a dataframe to csv

In [None]:
# Creat simple dataframe
import pandas as pd
df = pd.DataFrame({"Name":["a","b","c","d"],
                  "Score":[90,80,85,70]})
print(df)

In [None]:
# basic export 
df.to_csv("Name_score.csv")
# to check 
import os
file_loc = "Name_score.csv"
if os.path.exists(file_loc):
    print("Found")
else:
    print("Not Found")

<b>Customizing the CSV Export</b><br>
<b>1.Remove index column:</b> .to_csv("file.csv",index=False)#if not remove index=True<br>
<b>2. only selected column:</b> .to_csv("...",columns=[])# in this [] give any column you want.<br>
for example "Name" if you want only Name column.<br>
<b>3. Exclude Header Row:</b> (,header=False)#if want header less or header=True.<br>
<b>4.Handling Missing Values:</b> dataframe by default give 'NaN', if want to customize use ,na_rep(parameter).<br>
<b>5. Change Column Separator:</b> csv use (,) by default.However in some cases other delimiters may be required such as<br>
tabs (), semicolons (;), or pipes (|).
Using a different delimiter can make the file more readable or compatible with specific systems.

# Reading JSON files with pandas.
JSON = JavaScript Object Notation <br>
There are mainly three methods to read Json file using Pandas Some of them are:<br>
<b>1. Using pd.read_json() Method</b><br>
<b>2. Using JSON Module and pd.json_normalize() Method</b><br>
<b>3. Using pd.Dataframe() Methods</b>

In [None]:
# 1. pd.read_json()
import pandas as pd
df = pd.read_json(r"C:\Users\ASHRAF\Downloads\file.json")
print(df)

<b>2. Using json Module and pd.json_normalize() method:</b><br>
The json_normalize() is used when we are working with nested JSON structues.<br>
JSON from APIs often comes in nested form and this method helps to flatten it into a tabular format that’s easier to work<br> with in Pandas.This method is helpful when working with real-world JSON responses from APIs.






In [None]:
import pandas as pd
import json
data = {"One":{"0":60,"1":60,"2":40,"3":50,"4":79,"5":90},
       "Two":{"0":32,"1":444,"2":53,"3":78,"4":88,"5":43}}
json_data = json.dumps(data)

df_normalize = pd.json_normalize(json.loads(json_data))
print("\nDataFrame using JSON module and `pd.json_normalize()` method:\n",df_normalize)

<b>3. Using pd.DataFrame with a Dictionary</b>

In [None]:
df = pd.DataFrame(data)
print(df)

# Parsing JSON Dataset
If you're fetching JSON from a web URL or API you'll also need requests.<br>
like (import requests)

<b>Create a DataFrame and Convert It to JSON</b><br>
<b>convert json file use .to_json()</b><br>
JSON's orientations:<br>
<b>orient='split':</b> separates columns, index and data clearly.<br>
<b>orient='index':</b> shows each row as a key-value pair with its index.

In [None]:
import pandas as pd
import requests
#make simple dataframe
df = pd.DataFrame([['a','b'],['c','d']],index=['row1','row2'],columns=['col1','col2'])
print(df.to_json(orient='split'))
print(df.to_json(orient='index'))

<h1>Read the JSON File directly from Web Data</h1><br>
some function of requests class for read json file in pyhton:<br>
<b>requests.get(url):</b> fetches data from the URL.<br>
<b>response.json():</b> converts response to a Python dictionary/list.<br>
<b>json_normalize():</b> converts nested JSON into a flat table.

In [None]:
import pandas as pd
import requests
url ="https://jsonplaceholder.typicode.com/posts"
response = requests.get(url)
data = pd.json_normalize(response.json())
data.head()

<h1>Handling Nested JSON in Pandas:</h1><br>
nested JSON into a table use json_normalize() from pandas<br>
making it easier to analyze or manipulate in a table format.<br>
<b>json.load(f):</b>Loads the raw JSON into a Python dictionary.<br>
<b>json_normalize(d['programs']):</b>Extracts the list under the programs key and flattens it into columns.

In [None]:
import json
import pandas as pd
from pandas import json_normalize
with open(r"C:\Users\ASHRAF\OneDrive\Desktop\all_of_pandas\raw_nyc_phil.json") as f:
    d = json.load(f)

nycphil = json_normalize(d['programs'])
nycphil.head(3)

# Exporting Pandas DataFrame to JSON:
Pandas a powerful Python library for data manipulationprovides the to_json() function<br>
to convert a DataFrame into a JSON file and the read_json() function to read a JSON file into a DataFrame.<br>

In [None]:
# exporting a simple dataframe
import pandas as pd
data = pd.DataFrame([['a','b','c'],['d','e','f'],['g','h','i']],
                   index=['row1','row2','row3'],
                   columns=['col1','col2','col3'])
# convert to json
data.to_json("file_json1.json",orient="split",compression="infer",index=True)

# read json file
df = pd.read_json("file_json1.json",orient="split",compression="infer")
print(df)

<h1>JSON Orientations in Pandas:</h1><br>
<b>1.records:</b> List of dictionaries.<br>
<b>2.columns:</b>Dictionary with column labels.<br>
<b>3.index:</b>Dictionary with row indices.<br>
<b>4.split:</b>Dictionary with index, columns, and data.<br>
<b>5.table:</b>JSON table schema.

In [None]:
import pandas as pd

df = pd.DataFrame(data=[
    ['15135', 'Alex', '25/4/2014'],
    ['23515', 'Bob', '26/8/2018'],
    ['31313', 'Martha', '18/1/2019'],
    ['55665', 'Alen', '5/5/2020'],
    ['63513', 'Maria', '9/12/2020']],
    columns=['ID', 'NAME', 'DATE OF JOINING'])

df.to_json('file_json2.json', orient='split', compression='infer')

df = pd.read_json('file_json2.json', orient='split', compression='infer')
print(df)

# Reading excel file using pandas.
* To read excel file use<b>(pd.read_excel("file_name"))</b><br>
* If the excel file has multiple sheet use <b>.concat()</b> mathod.<br>to use this mathod first make different value for each sheet with<b> sheet_name,index_col</b><br>
* To view 5 columns from the top and from the bottom of the data frame, we can run the command.<br><b> This head() and tail()</b>method also take arguments as numbers for the number of columns to show. 

<h2>Sort_values() method in Pandas:</h2><br>
If any column contains numerical data, we can sort that column using the<b>variable.sort_values()</b> method.<br>
<h2>Pandas Describe() method:</h2><br>
Now, suppose our data is mostly numerical.<br>
We can get the statistical information like mean, max, min, etc. about the data frame using the <b>variable.describe()</b> method.<br>
<b>Note: it's as like as json or csv file.</b>

# Read Text Files with Pandas:
Below are the methods by which we can read text files with Pandas:<br>
<h2>Using read_csv()</h2>
<h2>Using read_table()</h2>
<h2>Using read_fwf()</h2>

<h1>1. using read_csv():</h1>
<b>syntax:</b><br>
data=pandas.read_csv('filename.txt', sep=' ', header=None, names=["Column1", "Column2"])

<h1>2. using read_table():</h1>
This function reads a general delimited file to a DataFrame object.<br> This function is essentially the same as the read_csv() function but with the delimiter = '\t', instead of a comma by default.<br> 
<b>syntax:</b><br>
     data=pandas.read_table('filename.txt', delimiter = ' ')

<h1>3. using read_fwf():</h1>
this read_fwf() read the contents effectively into separate columns<br>
<b>syntax:</b><br>
data=pandas.read_fwf('filename.txt')

# can convert text file to csv file.
1. first make a dataframe,<b>df=pd.read_csv("file.txt",header=None,delimiter='/')</b><br>
2. Then convert it like <b>df.to_csv("file_name",index=...,header=...)</b>