-Read CSV file
-
The CSV file (Data Science Job Salaries Dataset) -Analysis Data With python
-
with use pandas library (Put data in a frame,explore data ,add columns,replace values, change column name ,remove column ) -Loding the data into SQL server
-
With use pyodbc and sqlalchemy librarys connect to the sql server and loding the data into database
-
Step 1 : Read the data and explore data
#installing pandas data manipulation and analysis !pip install pandas --------------------------- import pandas as pd #read csv file with pandas ds_df=pd.read_csv(r'ds_salaries.csv') ds_df.head(4) ----------------------------- #show the laast 5 rows from the table ds_df.tail()
#show the columns ds_df.columns ---------------------- #show the columns data tyoe ds_df.dtypes ------------------- #total the columns and rows ds_df.shape ---------------------------- #show the max value ds_df['salary_in_usd'].max()
#how vlaues is null ds_df.isnull().sum() --------------------------- #show unique values and print it print('employment_type is',ds_df['employment_type'].unique()) print('experience_level is',ds_df['experience_level'].unique()) print('company_size is',ds_df['company_size'].unique()) ----------------------------------
-
Step 2 :Add columns and change column data type
#creat new column his name salary in SR ds_df['salary in SR']=ds_df['salary_in_usd']*3.75 ds_df.head() ---------------------------------- #show the salary in SR column SR data type ds_df['salary in SR'].dtype ----------------------------- #changing salary in SR data type ds_df['salary in SR']=ds_df['salary in SR'].astype(int) ds_df['salary in SR'].dtype ------------------------------------
#show the max and min values max=ds_df['salary in SR'].max() min=ds_df['salary in SR'].min() print('The max value ',max) print('The min value ',min) ---------------------- #creat new column his name salary in month ds_df['salary in month ']=ds_df['salary in SR']/12 ds_df.head()
-
Step 3 : Replace value and remove columns
#replace values ds_df['company_size'].replace({'L': 'large','S': 'small','M':'Medium'} ,inplace=True) ds_df['company_size']
#remove columns from table
ds_df.drop(columns=['salary''salary_currency''remote_ratio'] ,inplace=True)
…
-
Step 4 : Loding into SQL server
#install pyodbc provides access to ODBC (Open Database Connectivity) databases. !pip install pyodbc ------------------------- #install sqlalchemy and impotr it( provides a flexible and efficient way to work with databases.) #creat connect with SQL server(AAZ\SQLEXPRESS:server name ,lod=database name ) ds_df.to_sql('work_year',con=con,index=False,if_exists='replace')
#checking in sql serverserver
#creat new column his name new colimn ds_df['new colimn']=ds_df['salary in SR']/20 ds_df.head()
#checking in sql server and rename the table name and add index column