This repository consists of the codes and some case studies based on the topics, learned on pandas and EDA
pandas is a open source library written in python for data manipulation and analysis
Data Processing: Processing the data in simple terms Cleaning the data is important, if not processed the data may cause noise, errors and decrease the accuracy of the mode.
some common data issues: * incosistent data columns * Missing data * duplicate data * outliers etc.
- python_pandas basic introduction
- creating a dataframe from dictionary
- pandas series
- methods to access the dataframes ( loc, iloc )
- using logical operators on dataframe( np.logical )
- Looping over dataframes
- deleting a row
- importing files in pandas
- Text files
- Flat files
- Pickle files
- Excel files
- CSV files
- importing files from web
- JSON files
- importing multiple files
- Working on Dataframes in pandas
- subsetting the dataframe
- slicing rows and columns
- Filtering
- map()
- Vectorized functions
- Indexing
- Pivoting dataframe
- stacking and unstacking
- Melt
- pivot table
- groupby()
- outliers
- Appending
- Concatenating
- pattern matching
- converting to datatypes
- String manipulation using regular expression
- Duplicate Data
- Missing data
- asserts()
- statistics
- Visualization
- Matplotlib
- line plot
- histogram
- box plot
- scatter plot
- Matplotlib
- time series Date time indexing
- case study on Life Expectany
- case study on Olympic medals
- case study on austin weather changes using Datetime index
note: more changes will be made, content will be added as i keep making progress.