


### 南京大学计算传播学系列课程
***
***
# 《计算传播学的编程基础》
***
***

王成军 

wangchengjun@nju.edu.cn

计算传播网 http://computational-communication.com

<img align="right" style="padding-right:10px;" src="figures/logo.png">

<!--NAVIGATION-->
| [Contents](Index.ipynb) | [IPython: Beyond Normal Python](01.00-IPython-Beyond-Normal-Python.ipynb) >

<!--BOOK_INFORMATION-->
<img align="right" width="300px" style="padding-right:10px;" src="figures/PDSH-cover.png">
*[Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). *

If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!


# Preface

## What Is Data Science?

I will introduce how to do data science with Python

- a superfluous label
    - what science doesn't involve data?
- a simple buzzword
    - to salt resumes and catch the eye of overzealous tech recruiters.

> Data science, despite its hype-laden veneer, is perhaps the best label we have for the cross-disciplinary set of skills that are becoming increasingly important in many applications across industry and academia.

![Data Science Venn Diagram](figures/Data_Science_VD.png)

The best extisting definition of data science is illustrated by Drew Conway's Data Science Venn Diagram, first published on his blog in September 2010.

<small>(Source: [Drew Conway](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram). Used by permission.)</small>

## The essence of "data science"

- A fundamentally *interdisciplinary* subject.

Data science comprises three distinct and overlapping areas: 
- the skills of a *statistician* who knows how to model and summarize datasets (which are growing ever larger); 
- the skills of a *computer scientist* who can design and use algorithms to efficiently store, process, and visualize this data; 
- the *domain expertise*—what we might think of as "classical" training in a subject—necessary both to formulate the right questions and to put their answers in context.

### Data science is not a new domain of knowledge to learn, but a new set of skills that you can apply within your current area of expertise.

- reporting election results 
- forecasting stock returns
- optimizing online ad clicks 
- identifying microorganisms in microscope photos 
- seeking new classes of astronomical objects
- working with data in any other field

The goal of data science is to **give you the ability to ask and answer new questions** about your chosen subject area.

## Who Is This Course For?

One of the most common questions: **"how should I learn Python?"**

- an strong background in writing code and using computational and numerical tools.
- to learn the language with the aim of using it as a tool for data-intensive and computational science.

> While a large patchwork of videos, blog posts, and tutorials for this audience is available online, I've long been frustrated by the lack of a single good answer to this question; that is what inspired this book.



## The course is not meant to be an introduction to Python or to programming in general.

I assume the reader has familiarity with the Python language, including:
- defining functions
- assigning variables
- calling methods of objects
- controlling the flow of a program
- and other basic tasks.

If you are looking for a guide to the Python language itself, try "[A Whirlwind Tour of the Python Language](https://github.com/jakevdp/WhirlwindTourOfPython)".

## This course is meant to help Python users learn data science, and effectively store, manipulate, and gain insight from data.

Python's data science stack–libraries such as 
- Jupyter notebook
- NumPy
- Pandas
- Matplotlib
- Scikit-Learn
- and related tools



## Why Python?

Python has emerged over the last couple decades 

- a first-class tool for **scientific computing** 
- **analysis and visualization of large datasets**.

Python was not specifically designed with data analysis or scientific computing in mind.

## Why Python?

The usefulness of Python for data science stems primarily from the large and active ecosystem of third-party packages: 

- *NumPy* for manipulation of homogeneous array-based data
- *Pandas* for manipulation of heterogeneous and labeled data
- *SciPy* for common scientific computing tasks
- *Matplotlib* for publication-quality visualizations
- *Jupyter notebook* for interactive execution and sharing of code
- *Scikit-Learn* for machine learning
- and many more tools.

### Python 2 vs Python 3

- This book uses the syntax of Python 3.
- Python 2 will not be supported after **2020**.

## Outline of the Course

Each part of this course focuses on a particular package or tool that contributes a fundamental piece of the Python Data Sciece story.

1. Jupyter notebook: provides the computational environment.
2. NumPy: provides the ``ndarray`` for efficient storage and manipulation of dense data arrays.
3. Pandas: provides the ``DataFrame`` for efficient storage and manipulation of labeled/columnar data.
4. Matplotlib: provides capabilities for a flexible range of data visualizations.
5. Scikit-Learn: provides efficient & clean Python implementations of the most important and established machine learning algorithms.

## Using Code Examples

Supplemental material (code examples, figures, etc.) is available for download at http://github.com/jakevdp/PythonDataScienceHandbook/. 


> *The Python Data Science Handbook* by Jake VanderPlas (O’Reilly). Copyright 2016 Jake VanderPlas, 978-1-491-91205-8.



## Installation


**Anaconda** for Windows, Linux, or Mac OS X.


The Anaconda distribution comes in two flavors:

- [Miniconda](http://conda.pydata.org/miniconda.html) gives you the Python interpreter itself, along with a command-line tool called ``conda`` which operates as a cross-platform package manager geared toward Python packages, similar in spirit to the apt or yum tools that Linux users might be familiar with.

- [Anaconda](https://www.continuum.io/downloads) includes both Python and conda, and additionally bundles a suite of other pre-installed packages geared toward scientific computing. Because of the size of this bundle, expect the installation to consume several gigabytes of disk space.

## Installation

To get started, download and install the Miniconda package–make sure to choose a version with Python 3–and then install the core packages using the *terminal*:


> **conda install numpy pandas scikit-learn matplotlib seaborn jupyter**


The other more specialized tools in Python's scientific ecosystem; installation is usually as easy as typing 

> **``conda install packagename``**.

For more information on conda, refer to [conda's online documentation](http://conda.pydata.org/docs/).

# Thank You for Your Attention.

<img align="right" style="padding-right:10px;" src="figures/logo.png">

<!--NAVIGATION-->
| [Contents](Index.ipynb) | [IPython: Beyond Normal Python](01.00-IPython-Beyond-Normal-Python.ipynb) >