kmader edited this page Feb 24, 2016 · 2 revisions

Welcome to the course wiki

The wiki is designed to allow students to have a place to post questions, suggestions, alternative solutions, and anything that might be relevant for the course. It will take structure as people begin using it.

Project Ideas / Suggestions

Questions and Answers

Are there office hours for the course?

Office hours will be in ETZ H75 from 2 to 4 pm on Thursdays.

I have little programming / matlab experience is that ok?

There is no need to worry! We will focus the exercises on programming in Matlab, and there will be template scripts available, which then will be modified by you. This can happen during the exercise where you can ask any questions you want, or at home at your own pace. If you have any further questions regarding the exercises feel free to come by during the office hours! As I said earlier, take the first exercise as a chance to play around with Fiji/Matlab in order to get an idea about what is possible and how it works. For the beginning I advise you to familiarize yourself with how to handle matrices (address single entries, lines, columns; how can you find e.g. maxima or calculate the mean value of matrices/arrays?, how can you rearrange them?) and how you can display data and images (creating and modifying figures; how to use plot() and subplot(); what does imagesc() and hist() do?). Some of these things are already addressed in the first exercise, and if not use the command window to experiment a bit. And keep in mind: google and Matlap->Help are very useful tools!

I am good at programming, will this class be boring?

Usually the exercises (both the development of tools as well as the actual data analysis later) can be done in any language you see fit. As mentioned earlier you will always have the opportunity to compare your (numerical) results to ours, though scripting solutions will usually only be available in Matlab. If you already have an application or data set in mind and want to go beyond what we'll offer in the exercises, feel free to do so! However, keep in mind that - though we will do our best to help and advise you - we will focus our support on the exercises.

What about 'fast languages' like C/C++

As briefly discussed in the first lecture, the computing world has changed dramatically since the early 90s. Cloud-based computing power is very cheap and good programmers are hard to find and expensive. Thus the benefits of 'fast languages' like C/C++ are greatly diminished, since they are usually more difficult to program in.

  • I personally have done most of my development in Java since it is so well supported by 'big' data frameworks like Hadoop, Storm, Spark, and Akka [BigData,HadoopAtCern], has native support for concurrency, and can load platform independent libraries easily. It is certainly not as fast as C or C++ but writing and testing parallel code in C++ can be nightmarish. Java also allows slower code portions to be rewritten in C/C++ and then integrated using JNI, or offloaded to the GPU using OpenCL.
  • Python is similarly useful, but installing packages on foreign cluster machines is more difficult and it is more difficult to parallelize because of the [GIL].
  • R is nice as well with fantastic statistical analysis support but has a substantial amount of overhead and at the moment fewer available packages for 'big data'. It's large array support is also less well developed than Java or Python.

Alternative Solutions

Other tools / resources

Big Data

"Big Data" itself is a frequently abused term, but it doesn't have fixed definition on size. It rather refers to data that is so big, coming so fast, and is so diverse that it is difficult to process with 'standard' tools. It thus requires a new more distributed, more flexible, more fault-tolerant approach.

  1. [BigData] http://blog.samibadawi.com/2013/04/akka-vs-finagle-vs-storm.html
  2. [HadoopAtCern] http://cds.cern.ch/record/1201649/?ln=en
  3. [GIL] https://wiki.python.org/moin/GlobalInterpreterLock

Image Processing Tools

  • FIJI A version of ImageJ with many plugins and libraries included by default. The basic visualization tool to be used in the course and well supported in the biology and material sciences fields.

  • Paraview Paraview is a well-developed parallel, distributed image processing, simulation, and visualization tool written in C++. It is open-source and offers many built-in plugins for filtering, segmenting and classifying data. It offers built-in python interpreter support and can be easily scripted for automated workflows. We will use it in the course for visualizing 3 and 4D vector and tensor fields resulting from FIJI or other image processing tools.

  • MeVisLab MeVisLab is an image processing and visualization tool which supports graphical workflows to do image processing. It is well supported on Windows, Mac, and Linux and offers a number of commonly used ITK and VTK functions integrated in an easy to use manner. Its 3D rendering abilities are very good with the ability to combine many different phases into the same output.

  • Knime A workflow based image processing tool, for developing graphical, reproducible workflows for dealing with images.

  • BioImage A python-based tool for 2- and 3D image processing. Nice visualization tools and well-supported in the bio-imaging field

  • Tango A Java and MongoDB-based graphical image processing toolbox for cellular type data, but is applicable to many different types of data. The advantage of this tool is that it automatically stores all images, processing scripts, and results in a single database which can be located anywhere or on the cloud.